Flips security.redact_secrets from true to false in DEFAULT_CONFIG, and
the HERMES_REDACT_SECRETS env-var fallback in agent/redact.py now
requires explicit opt-in ("1"/"true"/"yes"/"on") to enable.
New installs and users without a security.redact_secrets key get pass-
through tool output. Existing users whose config.yaml explicitly sets
redact_secrets: true keep redaction on — the config-yaml -> env-var
bridges in hermes_cli/main.py and gateway/run.py still honor their
setting.
Also updates the inline config comments, website docs, and the
hermes-agent skill so /hermes config set security.redact_secrets true
is now the documented way to turn it on.
Plugins can now observe dangerous-command approval events in real time,
on both the CLI-interactive path and the async gateway path. This is the
missing hook surface external tools need to build approval notifiers
(macOS menu-bar allow/deny, Slack alerts, audit logs, etc.) without
forking Hermes or running a parallel gateway adapter.
Changes:
- hermes_cli/plugins.py: add two entries to VALID_HOOKS
- tools/approval.py: fire both hooks from check_all_command_guards --
around prompt_dangerous_approval (CLI surface) and around the
notify_cb + blocking event.wait loop (gateway surface)
- website/docs/user-guide/features/hooks.md: document both hooks with
a macOS-notification example
- tests/tools/test_approval_plugin_hooks.py: 5 tests covering CLI once,
CLI deny, plugin-crash resilience, gateway approve, gateway timeout
Hooks are observer-only: return values are ignored, so plugins cannot
veto or pre-answer an approval (use pre_tool_call for that). A crashing
plugin cannot break the approval flow -- invoke_hook swallows per-
callback errors, and the wrapper logs and swallows dispatch-layer
errors too.
Surface kwarg distinguishes "cli" from "gateway"; post hook reports
choice as one of once/session/always/deny/timeout.
- moveCursor(extend=true) now collapses to the bare cursor when the
computed offset equals the existing anchor instead of leaving a
zero-length sel. Without this, Shift+Left at col 0 / Shift+Home at
start would silently hide the hardware cursor (selected truthy)
without rendering any highlight.
- _tui_need_npm_install also catches UnicodeDecodeError so a corrupted
/ non-UTF8 lockfile falls back to the mtime path the docstring
promises instead of crashing.
Made-with: Cursor
* feat(tui): auto copy-on-select for transcript text
Drag in the transcript already highlighted but you had to press Cmd+C to
land it on the clipboard, and the highlight cleared on copy — most users
never realised selection existed. Now drag-release fires copySelectionNoClear
so the text is on the clipboard immediately while the highlight stays put,
matching iTerm2's "Copy to pasteboard on selection" default. Esc clears.
Behaviour:
- Single click in the input still positions the cursor (TextInput onClick).
- Single click in the transcript still does nothing destructive.
- Double / triple click select word / line, then drag extends.
- /copyselect [on|off|toggle] (alias /cos) flips the setting at runtime,
HERMES_TUI_DISABLE_COPY_ON_SELECT=1 disables at startup, persists via
display.tui_copy_on_select in config.yaml.
Help overlay now lists drag-select, multi-click, and click-to-position
so the gestures are discoverable.
Made-with: Cursor
* fix(tui): support prompt text selection gestures
Add mouse drag selection and Shift+Arrow/Home/End extension inside the TUI composer so prompt text behaves like a normal editable field while keeping click-to-position and right-click paste intact.
Made-with: Cursor
* Revert "feat(tui): auto copy-on-select for transcript text"
This reverts commit 6701288fe07a53af873e1ef53855a9618d733327.
* fix(tui): allow composer selection from prompt whitespace
Give the composer a one-cell mouse capture pad before the editable text. The prompt glyph/gutter still does not become selectable, but dragging from the edge now anchors at input offset 0 so users do not need to hit the first character precisely.
Made-with: Cursor
* fix(tui): clear selections from blank composer space
Clicking blank space in the transcript or composer now clears active TUI/input selections like a normal text surface. TextInput clicks stop bubbling so cursor placement and selection gestures keep their local behavior.
Made-with: Cursor
* fix(tui): delegate prompt gutter drags to composer text
The prompt gutter is now an input gesture region, not selectable content. Dragging from the whitespace or prompt area anchors the composer selection at offset 0, while selection highlight/copy remains limited to actual input text.
Made-with: Cursor
* fix(tui): move composer cursor to end on selection clear
External clear actions now collapse the composer selection to the end of the input, matching normal text-field behavior after dismissing a selection.
Made-with: Cursor
* fix(tui): capture composer padding before prompt
Add an explicit mouse capture cell over the left padding before the prompt glyph. Drags starting there now delegate to the composer input at offset 0 instead of starting terminal-level selection over the prompt chrome.
Made-with: Cursor
* fix(tui): avoid npm install on lockfile mtime churn
Compare package-lock.json against npm's hidden node_modules lock by content instead of mtimes. Git checkouts and npm lock rewrites can make the root lockfile newer even when installed dependencies already match, causing hermes --tui to print Installing TUI dependencies on every launch.
Made-with: Cursor
* fix(tui): include prompt leading cell in gesture region
Use the prompt box's real layout region to cover the leading whitespace cell before the glyph. The cell now participates in mouse hit testing and delegates to composer selection instead of starting terminal-level selection.
Made-with: Cursor
* fix(tui): widen prompt-side gesture capture band
Capture a wider left-side band around the composer prompt row so drags starting in terminal gutter/padding cells are consumed and delegated to input selection, instead of triggering terminal-level selection chrome.
Made-with: Cursor
* fix(tui): make pre-prompt spacer non-selectable content
Replace the sticky-prompt fallback `Text(' ')` with an empty spacer box so the visual gap remains but no literal space character is rendered/copyable before the composer prompt.
Made-with: Cursor
* fix(tui): capture pre-prompt spacer without shifting prompt layout
Revert the widened negative-margin prompt capture band and instead capture drags on the dedicated spacer row above the prompt. This keeps prompt/text alignment stable while still delegating whitespace-start drags to composer selection.
Made-with: Cursor
* fix(tui): align prompt with status bar and capture full input row
Drop the leading prompt column from 3 to 2 so the input first character lines up with the status bar text. Wrap the prompt+input row in a single mouse-capture box and stop event propagation from TextInput's own handlers so any drag in that row delegates to composer selection without leaking to terminal-level selection.
Made-with: Cursor
* fix(tui): anchor hardware cursor during composer selection
When a composer selection covers a row exactly the column width, the rendered text fills the row and the terminal auto-wraps the hardware cursor to col 0 of the next row, leaving a ghost block beneath the prompt. Park the cursor at the start of the input box during selection so it can't escape the input region.
Made-with: Cursor
* fix(tui): hide hardware cursor during composer selection
Stop fighting auto-wrap by hiding the hardware cursor outright while the
composer has an active selection. This prevents both the ghost block under
the prompt (cursor wrapping past the last cell) and the parked-cursor block
on the first selected character. The cursor restores as soon as the
selection clears or focus changes.
Made-with: Cursor
* chore(tui): /clean — drop dead capture-pad path, dedupe gutter handlers
- TextInput: remove unused leftCaptureColumns prop and capture-pad math, drop
unused mouseApi.startAt, fold mouse offset into a single offsetAt helper,
share a MouseEventLite type across the four handlers.
- appLayout: hoist a GutterMouseEvent type and an endInputDrag callback so the
spacer/prompt/input rows share one shape.
- _tui_need_npm_install: lift the runtime-only key set to a module constant,
collapse nested isinstance checks, and document the mtime fallback.
Made-with: Cursor
* fix(tui): address copilot review on PR #16732
- Split InputSelection.clear() into clear() (cursor-preserving) and
collapseToEnd() (clear + jump to end). Cmd+C copy paths keep using
clear() so the cursor stays put; the blank-area click in useMainApp
switches to collapseToEnd() to match the requested UX.
- Spacer-row drags now force row=0 when forwarding into the input,
since the spacer's vertical origin doesn't align with the input box
and Ink mouse-capture keeps dispatching motion to the original
target. Prompt+input row drag keeps localRow because origins match.
Made-with: Cursor
* fix(tui): give TextInput Box an explicit width
After the /clean pass dropped the unused capture-pad math, the wrapping
Box also lost its explicit width and started sizing to its rendered
content. Clicks past the last character missed TextInput and fell
through to the parent prompt-row Box, which collapsed the cursor to
offset 0. Pin the Box back to `columns` so the input owns its full
column span regardless of value length.
Made-with: Cursor
* feat(tui): double-click select-all + hide cursor on terminal blur
- Track click time/offset in TextInput so a quick second click on the
same offset triggers select-all. Ink's screen-level multi-click is
bypassed once our onMouseDown captures, so the gesture has to be
detected locally.
- Extend the cursor-hide effect to also fire when the terminal loses
focus, so the hollow-rect ghost most terminals draw at the parked
cursor position disappears too.
Made-with: Cursor
* chore(tui): /clean — extract isMultiClickAt helper
Pull the click-recurrence math out of TextInput's onMouseDown into a
small isMultiClickAt(offset) helper so the handler reads as the gesture
list it actually is (multi-click → select-all, otherwise start).
Drop the redundant length>0 guard now that selectAll() already noops on
an empty value.
Made-with: Cursor
* docs(tui): explain _tui_need_npm_install content-vs-mtime comparison
Expand the docstring so future readers understand why we parse the
lockfiles instead of comparing mtimes, what the optional/peer skip
covers, how stale hidden-lock entries are handled, and when we fall
back to mtime.
- config.py: remove dead ENV_VARS_BY_VERSION[17] entry (current _config_version
is 22, so all users are past version 17 and would never be prompted for
GMI_API_KEY on upgrade — consistent with how arcee was added)
- auxiliary_client.py: use google/gemini-3.1-flash-lite-preview as GMI aux
model instead of anthropic/claude-opus-4.6 (matches cheap fast-model pattern
used by all other providers: zai→glm-4.5-flash, kimi→kimi-k2-turbo-preview,
stepfun→step-3.5-flash, kilocode→google/gemini-3-flash-preview)
- test_gmi_provider.py: fix malformed write_text() call in doctor test
(was: write_text("GMI_API_KEY=*** encoding="utf-8") → missing closing quote,
wrote literal string 'GMI_API_KEY=*** encoding=' to .env file)
- test_gmi_provider.py + test_auxiliary_client.py: update aux model assertions
to match new cheaper default
- docs/integrations/providers.md: add 'gmi' to inline 'Supported providers'
fallback list (was only in the table, not the inline list at line ~1181)
- docs/reference/cli-commands.md: add 'gmi' to --provider choices list
- create HERMES_TUI_ACTIVE_SESSION_FILE with mkstemp instead of a predictable tmp path and always cleanup in finally
- add assertions that launch wiring uses a randomized session file path and removes it on exit
- use a grouped last_active join in search_sessions to avoid per-row correlated max lookups
- always close SessionDB in _resolve_last_session via finally and add regression coverage for search failure cleanup
- order session listing by computed last_active in SessionDB so callers get MRU rows directly
- keep _resolve_last_session as a single-row lookup and add regression coverage for >20 session sampling
The backup takes a consistent snapshot of each .db via sqlite3.backup(),
so shipping the live .db-wal / .db-shm / .db-journal alongside pairs the
fresh snapshot with stale sidecar state and produces a torn restore on
first open. Sidecars are transient and SQLite regenerates them on next
connection anyway.
This also trims multi-MB of junk from every zip — state.db-wal alone was
~9 MB here, doubled by the fact the WAL is the live write-ahead log, not
data.
Session-local trajectory cache — keyed by session hash, regenerated
per-session, won't port to another machine anyway. On a large install
this was multiple GB of pure noise in every zip.
Also adds a regression test for the pre-existing backups/ exclusion
so the two machine-local dirs share coverage.
The zip backup could add minutes to every 'hermes update' on large
HERMES_HOME directories. Flip the default to off and add a --backup
flag for one-off opt-in runs.
- updates.pre_update_backup default: True -> False
- hermes update: new --backup flag (opposite of existing --no-backup)
- Silent no-op when disabled (no message spam on every update)
- Existing --no-backup still works and wins over --backup
- Users who explicitly set pre_update_backup: true keep the old behavior
- Tests updated to cover default-off, --backup opt-in, and config-enabled paths
* feat(image-input): native multimodal routing based on model vision capability
Attach user-sent images as OpenAI-style content parts on the user turn when
the active model supports native vision, so vision-capable models see real
pixels instead of a lossy text description from vision_analyze.
Routing decision (agent/image_routing.py::decide_image_input_mode):
agent.image_input_mode = auto | native | text (default: auto)
In auto mode:
- If auxiliary.vision.provider/model is explicitly configured, keep the
text pipeline (user paid for a dedicated vision backend).
- Else if models.dev reports supports_vision=True for the active
provider/model, attach natively.
- Else fall back to text (current behaviour).
Call sites updated: gateway/run.py (all messaging platforms), tui_gateway
(dashboard/Ink), cli.py (interactive /attach + drag-drop).
run_agent.py changes:
- _prepare_anthropic_messages_for_api now passes image parts through
unchanged when the model supports vision — the Anthropic adapter
translates them to native image blocks. Previous behaviour
(vision_analyze → text) only runs for non-vision Anthropic models.
- New _prepare_messages_for_non_vision_model mirrors the same contract
for chat.completions and codex_responses paths, so non-vision models
on any provider get text-fallback instead of failing at the provider.
- New _model_supports_vision() helper reads models.dev caps.
vision_analyze description rewritten: positions it as a tool for images
NOT already visible in the conversation (URLs, tool output, deeper
inspection). Prevents the model from redundantly calling it on images
already attached natively.
Config default: agent.image_input_mode = auto.
Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py),
all existing tests that reference _prepare_anthropic_messages_for_api
still pass (198 targeted + new tests green).
* feat(image-input): size-cap + resize oversized images, charge image tokens in compressor
Two follow-ups that make the native image routing safer for long / heavy
sessions:
1) Oversize handling in build_native_content_parts:
- 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES,
the most restrictive provider — Gemini inline data).
- Delegates to vision_tools._resize_image_for_vision (Pillow-based,
already battle-tested) to downscale to 5 MB first-try.
- If Pillow is missing or resize still overshoots, the image is
dropped and reported back in skipped[]; caller falls back to text
enrichment for that image.
2) Image-token accounting in context_compressor:
- New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant;
within the realistic range for Anthropic/GPT-4o/Gemini billing).
- _content_length_for_budget() helper: sums text-part lengths and
charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/
input_image part. Base64 payload inside image_url is NOT counted
as chars — dimensions don't matter, only image-presence.
- Both tail-cut sites (_prune_old_tool_results L527 and
_find_tail_cut_by_tokens L1126) now call the helper so multi-image
conversations don't slip past compression budget.
Tests: 9 new in test_image_routing.py (oversize triggers resize,
resize-fails-returns-None, oversize-skipped-reported), 11 new in
test_compressor_image_tokens.py (flat charge per image, multiple images,
Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on
raw base64, bounds-check on the constant, integration test that an
image-heavy tail actually gets trimmed).
* fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits
The previous commit imposed a hardcoded 20 MB base64 ceiling on all
providers, triggering auto-resize on anything larger. This was wrong in
both directions:
* Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400
'image exceeds 5 MB maximum' above that).
* Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without
complaint (empirically verified April 2026 with progressive PNG
sizes).
New behaviour:
* _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a
ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder).
* Providers NOT in the table get no ceiling — images attach at native
size and we trust the provider to return its own error if it
disagrees. A provider-specific 400 message is clearer than us
guessing wrong and silently degrading image quality.
* build_native_content_parts() gains a keyword-only provider arg;
gateway/CLI/TUI pass the active provider so Anthropic users get
auto-resize protection while OpenAI users don't pay it.
* Resize target dropped from 5 MB to 4 MB to slide safely under
Anthropic's boundary with header overhead.
Empirical measurements (direct API, no Hermes in the loop):
image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5
0.19 MB ✓ ✓ ✓
12.37 MB ✗ 400 5MB ✓ ✓
23.85 MB ✗ 400 5MB ✓ ✓
49.46 MB ✗ 413 ✓ ✓
Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through,
Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts
routes ceiling by provider, unknown provider gets no ceiling. All 52
targeted tests pass.
* refactor(image-input): attempt native, shrink-and-retry on provider reject
Replace proactive per-provider size ceilings with a reactive shrink path
on the provider's actual rejection. All providers now attempt native
full-size attachment first; if the provider returns an image-too-large
error, the agent silently shrinks and retries once.
Why the previous design was wrong: hardcoding provider ceilings
(anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image
paid no tax, but Anthropic users lost quality on anything >5MB even
though the empirical behaviour at provider-reject time is the same
(shrink + retry). Baking the table into the routing layer also
requires updating Hermes every time a provider's limit changes.
Reactive design:
- image_routing.py: _file_to_data_url encodes native size, no ceiling.
build_native_content_parts drops its provider kwarg.
- error_classifier.py: new FailoverReason.image_too_large + pattern
match ("image exceeds", "image too large", etc.) checked BEFORE
context_overflow so Anthropic's 5MB rejection lands in the right
bucket.
- run_agent.py: new _try_shrink_image_parts_in_messages walks api
messages in-place, re-encodes oversized data: URL image parts
through vision_tools._resize_image_for_vision to fit under 4MB,
handles both chat.completions (dict image_url) and Responses
(string image_url) shapes, ignores http URLs (provider-fetched).
New image_shrink_retry_attempted flag in the retry loop fires the
shrink exactly once per turn after credential-pool recovery but
before auth retries.
E2E verified live against Anthropic claude-sonnet-4-6:
- 17.9MB PNG (23.9MB b64) attached at native size
- Anthropic returns 400 "image exceeds 5 MB maximum"
- Agent logs '📐 Image(s) exceeded provider size limit — shrank and
retrying...'
- Retry succeeds, correct response delivered in 6.8s total.
Tests: 12 new (8 shrink-helper shapes + 4 classifier signals),
replaces 5 proactive-ceiling tests with 3 simpler 'native attach works'
tests. 181 targeted tests pass. test_enum_members_exist in
test_error_classifier.py updated for the new enum value.
Every 'hermes update' now runs a full backup of ~/.hermes/ first, so
users can always roll back to the exact state they had before the
update if anything goes wrong (corrupted sessions.db, broken skills,
config migrations that don't round-trip, etc.).
Changes:
- hermes_cli/backup.py: new create_pre_update_backup() helper. Writes
to <HERMES_HOME>/backups/pre-update-<stamp>.zip using the same
exclusion rules and SQLite safe-copy as 'hermes backup'. Auto-rotates
(keep last N, pre-update-*.zip only — hand-dropped zips in backups/
are untouched). Adds 'backups' to _EXCLUDED_DIRS so subsequent backups
don't nest prior ones.
- hermes_cli/main.py: _run_pre_update_backup() wired into
_cmd_update_impl before any git operation. Prints save path, restore
command, and how to disable. Swallows failures so a broken backup
never blocks the update itself. New --no-backup flag on 'hermes
update' for one-off override.
- hermes_cli/config.py: new 'updates' section in DEFAULT_CONFIG with
pre_update_backup (default true) and backup_keep (default 5).
Auto-surfaces in the dashboard config UI.
- tests/hermes_cli/test_backup.py: +11 tests covering backup location,
content parity with 'hermes backup', no-recursion, rotation, manual
file preservation, config gate, --no-backup flag, flag-wins-over-config.
The CLI renders through prompt_toolkit in non-full-screen mode, so every
repaint uses the renderer's tracked _cursor_pos.y to cursor_up() + erase
before drawing the new frame. Any time that tracked position drifts from
terminal reality, redraws stack on top of stale content instead of
overwriting it. Four user-visible bugs share this root cause.
Fixes:
- #5474 (SIGWINCH ghosts): the resize wrapper previously only handled
column-shrink reflow. Generalize it to force a full screen-clear
(erase_screen + cursor_goto(0,0)) and renderer.reset() on every resize
— covers widen, row-shrink, and multiplexer SIGWINCH-less redraws.
- #8688 (cmux/tmux tab switch): no SIGWINCH fires on focus regain, so
prompt_toolkit has no signal to recover. Add a _force_full_redraw()
helper, bound to Ctrl+L (standard bash/zsh/vim convention) and exposed
as /redraw. Users can manually clear drift without restarting Hermes.
- #14692 (DSR response leaks — ^[[53;1R): resize storms make
prompt_toolkit's CSI 6n queries race past the input parser; the
terminal's reply ends up as literal input text. Add a sibling of the
bracketed-paste sanitizer that strips \x1b[<row>;<col>R and the
caret-escape visible form from paste text, buffer text-filter, and
the input-processing loop.
The idle-redraw removal (#12641) is in the preceding commit from
@foxion37 — keeping them as separate commits preserves attribution.
Previously 'hermes debug share' uploads only got DELETEd when the user
ran 'hermes debug share' again — opportunistic-sweep-on-invoke was the
only cleanup path. A user who uploaded once and never ran debug again
left pastes up until paste.rs's retention kicked in (which, empirically,
never actually expires them).
Hook _sweep_expired_pastes into the gateway cron ticker at the same
hourly cadence as the image/document cache cleanups. The opportunistic
sweep in 'hermes debug share' stays as a fallback for CLI-only users
who never start the gateway.
Quick state snapshot now includes pairing JSONs (generic + legacy +
Feishu comment pairing), and `hermes update` takes a pre-update
snapshot labeled `pre-update` before pulling.
Pairing data lives outside state.db in platform-specific JSONs under
~/.hermes/pairing/, ~/.hermes/platforms/pairing/, and
~/.hermes/feishu_comment_pairing.json. The update command already
couldn't touch $HERMES_HOME, but #15733 reports lost pairing after
an update — this gives users something to restore from via
`/snapshot list` / `/snapshot restore <id>` if anything clobbers
the approved-user lists.
- Extend _QUICK_STATE_FILES with pairing paths (files + dirs)
- Snapshot walks directories recursively and records each file in the
manifest individually so restore logic is unchanged
- _cmd_update_impl calls create_quick_snapshot(label='pre-update')
after 'Found N new commits' and before 'Pulling updates'
- Snapshot failures are logged at debug and never block the update
Refs #15733.
When 'hermes model' runs against a providers: (keyed-schema) entry that
relies only on key_env, the picker resolves the env var for the live
/models request and then wrote a synthesized 'api_key: ${KEY_ENV}' back
to the providers.<key> entry. That's redundant — the runtime already
resolves from key_env directly — and it clutters configs that
intentionally keep credentials out of config.yaml.
Only persist provider_entry['api_key'] when the user originally had an
inline value (literal secret or ${VAR} template). Entries that declared
only key_env stay clean on save.
Fixes#15803.
Azure Foundry deploys GPT-5.x, codex-*, and o1/o3/o4 reasoning models as
Responses-API-only. Calling /chat/completions against these deployments
returns 400 'The requested operation is unsupported.', which broke any
user who ran 'hermes model' on Azure, picked a gpt-5/codex deployment,
and kept the default api_mode: chat_completions. Verified in a user
debug bundle on 2026-04-26: gpt-5.3-codex failed on synopsisse.openai.azure.com
with that exact payload while gpt-4o-pure on the same endpoint worked.
Adds azure_foundry_model_api_mode(model_name) that returns
codex_responses when the model name starts with gpt-5, codex, o1, o3,
or o4 — otherwise None so chat_completions / anthropic_messages stay
untouched for gpt-4o, Llama, Claude-via-Anthropic, etc.
Resolver (both the direct Azure Foundry path and the pool-entry path)
consults it and upgrades api_mode unless the user explicitly picked
anthropic_messages. target_model (from /model mid-session switch)
takes precedence over the persisted default so switching from gpt-4o
to gpt-5.3-codex routes correctly before the next request.
Docs: correct the azure-foundry guide which previously claimed Azure
keeps gpt-5.x on chat completions — that was only true for early Azure
OpenAI, not Azure Foundry codex/o-series deployments.
Tests: 14 unit tests for azure_foundry_model_api_mode + 6 integration
tests in TestAzureFoundryResolution covering Bob's exact scenario,
target_model override, anthropic_messages guard, and o3-mini.
* feat(skills): install skills from a direct HTTP(S) URL
Adds UrlSource adapter so `hermes skills install <url-to-SKILL.md>` and
`/skills install <url>` work as first-class operations — no more
improvising with curl + patch + cp.
- Claims identifiers that start with http(s):// and end in .md
- Skips /.well-known/skills/ URLs (WellKnownSkillSource handles those)
- Skill name from YAML frontmatter, URL-slug fallback
- Single-file SKILL.md only (v1 scope — multi-file skills need a manifest)
- Trust level 'community'; full security scan still runs
- Lock file stores the URL as identifier so `hermes skills update`
re-fetches from the same URL cleanly
Scope matches real user need from @versun's docx feedback where
`https://sharethis.chat/SKILL.md` had no first-class install path.
* feat(skills): interactive name/category for URL installs + --name override
Follow-up to the UrlSource adapter. The previous commit fell back to weak
heuristics when frontmatter had no ``name:`` and could produce garbage names
like ``SKILL`` or ``unnamed-skill``. Now:
tools/skills_hub.py
- ``UrlSource._is_valid_skill_name()`` — strict identifier check
(``^[a-z][a-z0-9_-]*$``), rejects sentinel values (``SKILL``, ``README``,
``INDEX``, ``unnamed-skill``, empty, non-strings).
- ``_resolve_skill_name()`` returns ``Optional[str]`` — ``None`` when
nothing valid is resolvable. Also ignores unsafe frontmatter names
(``../evil``) and falls through to URL slug instead of returning None
immediately, so a URL with a bad frontmatter but a good path still
works.
- ``fetch()``/``inspect()`` carry an ``awaiting_name=True`` marker in
metadata/extra when resolution fails, letting ``do_install`` decide
whether to prompt, apply an override, or error out.
hermes_cli/skills_hub.py
- ``do_install`` gains a ``name_override`` parameter.
- On URL-sourced bundles with ``awaiting_name=True``:
1. If ``name_override`` is valid → use it.
2. If ``name_override`` is invalid → refuse with a clear error.
3. Else if ``skip_confirm=True`` (non-interactive: slash / TUI /
gateway / scripts) → refuse with an actionable retry hint pointing
at ``--name <your-name>`` on both CLI and slash forms.
4. Else (interactive TTY) → prompt for the name.
- Interactive TTY also prompts for a category when none is given for a
URL-sourced install, hinting existing category buckets so users can
reuse ``productivity``, ``devops``, etc. Empty input → flat install.
- ``_existing_categories()`` scans ``~/.hermes/skills/`` for subdirs that
look like category buckets (contain nested SKILL.md files); skips
top-level skills and hidden dirs.
- ``_prompt_for_skill_name()`` / ``_prompt_for_category()`` helpers
(EOF/Ctrl-C-safe, match the existing ``Confirm [y/N]`` prompt style).
hermes_cli/main.py
- ``hermes skills install`` argparse gains ``--name <name>``.
hermes_cli/skills_hub.py (slash)
- ``/skills install <url> --name <x>`` parsing added.
Tests
- tests/tools/test_skills_hub.py: updated ``UrlSource`` tests to assert
the new ``awaiting_name`` metadata; added 4 new tests for
``_is_valid_skill_name`` rejection sets and the awaiting-name marker.
- tests/hermes_cli/test_skills_hub.py: 8 new tests covering --name
override accept/reject, non-interactive error, interactive name prompt,
interactive category prompt, cancel-aborts-install, and
``_existing_categories`` scan behavior (buckets vs flat skills).
- E2E verified all four paths (no-name/no-override → error;
--name override → install; frontmatter name → install;
invalid --name → rejection).
---------
Co-authored-by: teknium1 <teknium@noreply.github.com>
Both get_provider_request_timeout() and get_provider_stale_timeout()
wrapped the load_config import in try/except ImportError but left the
actual load_config() call unprotected. A corrupt config file, YAML
parse error, or permission failure would raise instead of returning
None safely.
Move load_config() inside the try block so any exception returns None.
- remove the temporary -c MRU logic and companion test from this branch so PR #15926 stays focused on TUI perf work
- keep the resume-ordering change isolated in the dedicated follow-up PR
CPU profiling showed the built TUI loading React development modules unless NODE_ENV was set. Default CLI and dashboard TUI children to production while preserving explicit user overrides.
Every working dir hermes ever touches gets its own shadow git repo under
~/.hermes/checkpoints/{sha256(abs_dir)[:16]}/. The per-repo _prune is a
no-op (comment in CheckpointManager._prune says so), so abandoned repos
from deleted/moved projects or one-off tmp dirs pile up forever. Field
reports put the typical offender at 1000+ repos / ~12 GB on active
contributor machines.
Adds an opt-in startup sweep that mirrors the sessions.auto_prune
pattern from #13861 / #16286:
- tools/checkpoint_manager.py: new prune_checkpoints() and
maybe_auto_prune_checkpoints() helpers. Deletes shadow repos that
are orphan (HERMES_WORKDIR marker points to a path that no longer
exists) or stale (newest in-repo mtime older than retention_days).
Idempotent via a CHECKPOINT_BASE/.last_prune marker file so it only
runs once per min_interval_hours regardless of how many hermes
processes start up.
- hermes_cli/config.py: new checkpoints.auto_prune /
retention_days / delete_orphans / min_interval_hours knobs.
Default auto_prune: false so users who rely on /rollback against
long-ago sessions never lose data silently.
- cli.py / gateway/run.py: startup hooks gated on checkpoints.auto_prune,
called right next to the existing state.db maintenance block.
- Docs updated with the new config knobs.
- 11 regression tests: orphan/stale deletion, precedence, byte-freed
tracking, non-shadow dir skip, interval gating, corrupt marker
recovery.
Refs #3015 (session-file disk growth was fixed in #16286; this covers
the checkpoint side noted out-of-scope there).
Follow-up to #15960 — the provider-active detection in tools_config.py
also read use_gateway with raw truthiness (is False, not dict.get), so
quoted 'false' caused the FAL-direct row to show wrong active status in
the hermes tools picker. Route both sites through is_truthy_value().
`npm install --silent` (used by `_build_web_ui` and `_update_node_dependencies`)
silently rewrites package-lock.json on npm ≥ 10 (strips "peer": true etc.),
leaving the working tree dirty after every `hermes update`. The next update
then detects the dirty lockfile and stashes it — producing a trail of
hermes-update-autostash entries for web/package-lock.json, ui-tui/package-lock.json,
and root package-lock.json.
Switch to `npm ci` (strict, lockfile-preserving) via a new
`_run_npm_install_deterministic` helper that falls back to `npm install`
when the lockfile is missing or out of sync (WIP forks).
Verified locally: all three lockfiles stay byte-identical after the real
_build_web_ui / _update_node_dependencies run twice back-to-back. Fallback
path tested with a deliberately out-of-sync lockfile and a no-lockfile case.
Four independent session-UX bugs reported by an external user (#16294).
/save wrote hermes_conversation_<ts>.json to CWD — invisible to
'hermes sessions browse' and easy to lose. Snapshots now write under
~/.hermes/sessions/saved/ and the command prints the absolute path plus
a 'hermes --resume <id>' hint for the live DB-indexed session.
'hermes sessions browse' default --limit raised from 50 to 500. With the
old ceiling, users with moderately long histories saw only the most
recent 50 rows and assumed older sessions had been lost.
TUI session.list (`/resume` picker) switched from a hardcoded allow-list
of 13 gateway source names to a deny-list of just { 'tool' }. Sessions
tagged acp / webhook / user-defined HERMES_SESSION_SOURCE values and
any newly-added platform now surface. Default limit 20 → 200.
ollama-cloud provider setup passes force_refresh=True to
fetch_ollama_cloud_models() so a user entering their API key sees the
fresh catalog (e.g. deepseek v4 flash, kimi k2.6) immediately instead
of waiting up to an hour for the disk cache TTL to expire.
Closes#16294.
Adds NOTION_API_KEY, LINEAR_API_KEY, TENOR_API_KEY, and AIRTABLE_API_KEY
to OPTIONAL_ENV_VARS so:
- They persist to ~/.hermes/.env via save_env_value like every other
key Hermes knows about, instead of being ad-hoc variables the user
has to hand-edit the dotfile for.
- load_env() / reload_env() populate os.environ from .env on every
startup — the user sets the key once, skills keep working across
restarts without losing access.
- hermes setup / hermes config show surface them as known optional
vars with the correct signup URL (linear.app/settings/api,
airtable.com/create/tokens, etc.).
These four entries use category="skill" (new) rather than "tool".
tools/environments/local.py auto-adds every category=tool/messaging
entry to _HERMES_PROVIDER_ENV_BLOCKLIST, which stops env passthrough
from leaking provider credentials into the execute_code sandbox
(GHSA-rhgp-j443-p4rf). Skill API keys are the opposite case — the
point is for the agent's subprocess to see them so curl can read
Authorization headers — so they must be outside the blocklist. The
new category is inert for that check.
All four entries are advanced=True: they show up in 'hermes config'
and 'hermes status' displays, but do not nag users who have never
touched those skills during setup checklists.
E2E verified: save_env_value → reload_env → os.environ populated →
skill_view reports setup_needed=False → env_passthrough registers
the key for subprocess inheritance.
_web_ui_build_needed() in PR #14914 checked web_dir/"dist" as the
sentinel, but vite.config.ts sets outDir: "../hermes_cli/web_dist" so
the build output lands in hermes_cli/web_dist/, never in web/dist/.
The sentinel was therefore always missing → _web_ui_build_needed always
returned True → npm install + Vite build ran on every startup → OOM on
low-memory VPS persisted unchanged.
Fix: derive dist_dir as web_dir.parent / "hermes_cli" / "web_dist" so
the sentinel points to the actual build output directory.
Fixes#14898
`delete_session()` and `prune_sessions()` only removed SQLite records,
leaving .json/.jsonl transcript files on disk forever. Over time this
causes unbounded disk growth (~27MB/day observed).
Changes:
- Add `_remove_session_files()` static helper that cleans up
`{session_id}.json`, `.jsonl`, and `request_dump_{session_id}_*.json`
- `delete_session()` accepts optional `sessions_dir` param and removes
files for the deleted session and its children
- `prune_sessions()` accepts optional `sessions_dir` param and removes
files for all pruned sessions after the DB transaction
- Wire up CLI `hermes sessions delete` and `hermes sessions prune` to
pass `sessions_dir`
- File cleanup is best-effort (OSError silenced) so DB operations are
never blocked by filesystem issues
- Fully backward-compatible: `sessions_dir=None` (default) preserves
existing behavior
Enter while the agent is busy can now inject the typed text via /steer —
arriving at the agent after the next tool call — instead of interrupting
(current default) or queueing for the next turn.
Changes:
- cli.py: keybinding honors busy_input_mode='steer' by calling
agent.steer(text) on the UI thread (thread-safe), with automatic
fallback to 'queue' when the agent is missing, steer() is unavailable,
images are attached, or steer() rejects the payload. /busy accepts
'steer' as a fourth argument alongside queue/interrupt/status.
- gateway/run.py: busy-message handler and the PRIORITY running-agent
path both route through running_agent.steer() when the mode is 'steer',
with the same fallback-to-queue safety net. Ack wording tells users
their message was steered into the current run. Restart-drain queueing
now also activates for 'steer' so messages aren't lost across restarts.
- agent/onboarding.py: first-touch hint has a steer branch for both
CLI and gateway.
- hermes_cli/commands.py: /busy args_hint updated to include steer,
and 'steer' is registered as a subcommand (completions).
- hermes_cli/web_server.py: dashboard select widget offers steer.
- hermes_cli/config.py, cli-config.yaml.example, hermes_cli/tips.py:
inline docs updated.
- website/docs/user-guide/cli.md + messaging/index.md: documented.
- Tests: steer set/status path for /busy; onboarding hints;
_load_busy_input_mode accepts steer; busy-session ack exercises
steer success + two fallback-to-queue branches.
Requested on X by @CodingAcct.
Default is unchanged (interrupt).
After #14798 made cron honor per-platform `hermes tools` config, the
`_DEFAULT_OFF_TOOLSETS` filter silently stripped `homeassistant` from
cron jobs for users who'd been relying on the previous blanket toolset.
Norbert's HA cron reports regressed as a result.
The HA toolset is already runtime-gated by its `check_fn` (requires
HASS_TOKEN to register any tools). When HASS_TOKEN is set the user has
explicitly opted in — `_DEFAULT_OFF_TOOLSETS` adds nothing in that case,
so stop double-gating and restore HA for cron / cli / other platforms
without an explicit saved toolset list.
moa and rl stay off by default (original #14798 goal preserved).
Fixes HA cron regression reported by Norbert.
Removes deepseek/deepseek-v4-pro and deepseek/deepseek-v4-flash from
OPENROUTER_MODELS and _PROVIDER_MODELS['nous'], then regenerates
website/static/api/model-catalog.json so the hosted picker JSON drops
them too. Direct-API deepseek provider support is unchanged.
* fix(install): add /usr/local/bin PATH guard for RHEL root non-login shells
The FHS-layout branch assumed /usr/local/bin is on PATH for every
standard shell. That holds for login shells (via /etc/profile's
pathmunge) but breaks on RHEL/CentOS/Rocky/Alma 8+ root in non-login
interactive shells (su, sudo -s, tmux panes, some web terminals) —
/etc/bashrc does not add /usr/local/bin and /root/.bash_profile
doesn't either. Result: hermes command links to /usr/local/bin/hermes
but the user has to type the absolute path each time.
Probe a fresh 'bash -i -c' (non-login interactive, matching the user
scenario) after symlinking. If hermes isn't resolvable, append an
idempotent PATH guard to /root/.bashrc and /root/.bash_profile, same
grep pattern already used by the ~/.local/bin branch below. No change
on distros where /usr/local/bin is already inherited.
* fix(update): repair RHEL root PATH on hermes update
Existing RHEL/CentOS/Rocky/Alma root installs won't be repaired by the
install.sh fix alone because 'hermes update' is an in-place git pull, not
a rerun of install.sh. Port the same probe + idempotent .bashrc write
into cmd_update so affected users get fixed automatically on next update.
_ensure_fhs_path_guard() runs after 'Update complete!':
- Linux + root + FHS-layout install (command at /usr/local/bin/hermes) only
- Probe: env -i bash -i -c 'command -v hermes' — fresh non-login interactive
shell, same scenario the user reports
- On failure, append PATH guard to /root/.bashrc and /root/.bash_profile,
skipping if any uncommented PATH line already mentions /usr/local/bin
- Silent no-op on macOS, non-root, legacy layout, or shells that already
resolve hermes
Every command in COMMAND_REGISTRY (/btw, /stop, /model, /help, /new,
/bg, /reset, ...) is now a first-class Slack slash command instead of
a /hermes <subcommand>. Users get the same autocomplete-driven slash
picker experience Slack users expect and that Discord and Telegram
already provide.
Previously Slack registered ONE native slash (/hermes) and split on
the first word, so typing /btw in Slack's composer got 'couldn't find
an app for /btw' because the workspace manifest never declared it.
Changes
- hermes_cli/commands.py: slack_native_slashes() + slack_app_manifest()
generate a Slack manifest from the registry (canonical names +
aliases + plugin commands), clamped to Slack's 50-slash cap with
/hermes reserved as the catch-all.
- gateway/platforms/slack.py: single regex matcher dispatches every
registered slash to _handle_slash_command, which dispatches on
command['command']. Legacy /hermes <subcommand> keeps working for
backward compat with older workspace manifests.
- hermes_cli/slack_cli.py + hermes_cli/main.py: new 'hermes slack
manifest' command prints/writes a full manifest (display info,
OAuth scopes, event subs, socket mode, slash commands) ready to
paste into 'Create from manifest' or Features → App Manifest.
- hermes_cli/setup.py: _setup_slack() now writes the manifest up-front
and points users at the 'From an app manifest' flow; also offers
to refresh the manifest on reconfigure for picking up new commands.
- Tests: 14 new tests covering native-slash dispatch (/btw, /stop,
/model), legacy /hermes <sub> compat, manifest structure, and
telegram<->slack parity (every Telegram command must also register
as a Slack slash). Existing /hermes-registration test updated to
assert the new regex matches /hermes, /btw, /stop, /model, /help.
- Docs: slack.md gains a 'Slash Commands' section + Option A manifest
flow in Step 1; cli-commands.md documents 'hermes slack manifest'.
Users pick up the new slashes by running 'hermes slack manifest --write'
and pasting into Features → App Manifest → Edit in their Slack app
config, then Save (Slack prompts for reinstall if scopes changed).
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
'hermes skills list' now shows every skill's enabled/disabled status
and accepts --enabled-only to filter down to what will actually load
for the active profile:
hermes -p dario skills list --enabled-only
Previously the command was a flat catalog — it did not apply
skills.disabled from config.yaml, so there was no way to see the
live skill set for a profile without reading config by hand.
Profile switching already works via -p (swaps HERMES_HOME); this
just surfaces the result visibly.
Changes:
- hermes_cli/skills_hub.py: do_list adds a Status column and an
enabled_only filter; summary reports enabled/disabled split
- hermes_cli/main.py: --enabled-only flag on 'skills list'
- /skills list slash command accepts --enabled-only too
- tests: 4 new (status column, disabled marking, enabled-only
hiding, no platform leakage into get_disabled_skill_names);
existing fixtures updated to accept skip_disabled kwarg
Reported by @mochizukimr on X.
Follow-up to cherry-picked PR #15920:
- agent/credential_pool.py: hoist 'from hermes_cli.config import get_env_value'
to module top instead of inline try/except in each seed site (3 sites).
No import cycle — hermes_cli/config.py doesn't depend on agent.credential_pool.
- hermes_cli/auth.py: same hoist for the _resolve_api_key_provider_secret loop.
- tests/tools/test_credential_pool_env_fallback.py: replace smoke-only tests
with real .env file I/O. Each test writes a temp ~/.hermes/.env, verifies
_seed_from_env / _resolve_api_key_provider_secret read from it, and asserts
the full priority chain: os.environ > .env > credential_pool. Uses
'deepseek' as the test provider since 'openai' isn't in PROVIDER_REGISTRY
and _seed_from_env's generic path requires a real pconfig lookup.
_resolve_api_key_provider_secret() and _seed_from_env() only checked
os.environ for provider API keys. When keys exist in ~/.hermes/.env but
are not loaded into the process environment (e.g. ACP adapter entry
point, post-session-start .env edits, or non-CLI entry points), the
resolution returns an empty string, causing HTTP 401 failures.
Changes:
- credential_pool._seed_from_env: use get_env_value() which checks both
os.environ and ~/.hermes/.env file, preventing _prune_stale_seeded_entries
from removing valid entries whose env var isn't in os.environ
- credential_pool._seed_from_env: same fix for openrouter and
base_url_env_var resolution
- auth._resolve_api_key_provider_secret: use get_env_value() instead of
os.getenv(), and add credential_pool fallback when env resolution fails
Fixes#15914
New `hermes kanban` CLI subcommand + `/kanban` slash command + skills for
worker and orchestrator profiles. SQLite-backed task board
(~/.hermes/kanban.db) shared across all profiles on the host. Zero
changes to run_agent.py, no new core tools, no tool-schema bloat.
Motivation: delegate_task is a function call — sync fork/join, anonymous
subagent, no resumability, no human-in-the-loop. Kanban is the durable
shape needed for research triage, scheduled ops, digital twins,
engineering pipelines, and fleet work. They coexist (workers may call
delegate_task internally).
What this adds
- hermes_cli/kanban_db.py — schema, CAS claim, dependency resolution,
dispatcher, workspace resolution, worker-context builder.
- hermes_cli/kanban.py — 15-verb CLI surface and shared run_slash()
entry point used by both CLI and gateway.
- skills/devops/kanban-worker — how a profile should work a claimed task.
- skills/devops/kanban-orchestrator — "you are a dispatcher, not a
worker" template with anti-temptation rules.
- /kanban slash command wired into cli.py and gateway/run.py. Bypasses
the running-agent guard (board writes don't touch agent state), so
/kanban unblock can free a stuck worker mid-conversation.
- Design spec at docs/hermes-kanban-v1-spec.pdf — comparative analysis
vs Cline Kanban, Paperclip, NanoClaw, Gemini Enterprise; 8 patterns;
4 user stories; implementation plan; concurrency correctness.
- Docs: website/docs/user-guide/features/kanban.md, CLI reference
updated, sidebar entry added.
Architecture highlights
- Three planes: control (user + gateway), state (board + dispatcher),
execution (pool of profile processes).
- Every worker is a full OS process, spawned as `hermes -p <profile>`.
No in-process subagent swarms — solves NanoClaw's SDK-lifecycle
failure class.
- Atomic claim via SQLite CAS in a BEGIN IMMEDIATE transaction; stale
claims reclaimed 15 min after their TTL expires.
- Tenant namespacing via one nullable column — one specialist fleet
can serve many businesses with data isolation by workspace path.
Tests: 60 targeted tests (schema, CAS atomicity, dependency resolution,
dispatcher, workspace kinds, tenancy, CLI + slash surface). All pass
hermetic via scripts/run_tests.sh.
The ephemeral no-tools side-question variant of /btw confused users who
expected 'by-the-way' to mean 'run this off to the side with tools' —
they'd type /btw and get a toolless agent that couldn't do the work.
/bg worked because it was /background with full tools.
Collapse the two: /btw and /bg both alias to /background. One command,
one behavior, no more gotchas about which variant has tools.
Removed:
- _handle_btw_command in cli.py and gateway/run.py
- _run_btw_task + _active_btw_tasks state in gateway/run.py
- prompt.btw JSON-RPC method + btw.complete event in tui_gateway
- BtwStartResponse type + btw.complete case in ui-tui
- Standalone /btw slash tree registration in Discord
- Standalone btw CommandDef in hermes_cli/commands.py
Updated:
- background CommandDef aliases: (bg,) -> (bg, btw)
- TUI session.ts: local btw handler merged into background
- Docs and tips updated to describe /btw as a /background alias
Manage the fallback_providers chain from the CLI instead of hand-editing
config.yaml. The picker reuses select_provider_and_model() from 'hermes
model' — same provider list, same credential prompts, same model picker.
hermes fallback [list] Show the current chain (primary + fallbacks)
hermes fallback add Run the model picker, append selection to chain
hermes fallback remove Pick an entry to delete (arrow-key menu)
hermes fallback clear Remove all entries (with confirmation)
'add' snapshots config['model'] before calling the picker, extracts the
user's selection from the post-picker state, then restores the primary
and appends {provider, model, base_url?, api_mode?} to fallback_providers.
Auth store's active_provider is snapshot/restored too so OAuth-provider
fallbacks don't silently deactivate the user's primary. Duplicates and
self-as-fallback are rejected. Legacy single-dict 'fallback_model' entries
are auto-migrated to the list format on first write.
Instead of a blocking first-run questionnaire, show a one-time hint the first
time the user hits each behavior fork:
1. First message while the agent is working — appends a hint to the busy-ack
explaining the /busy queue vs /busy interrupt knob, phrased to match the
mode that was just applied (don't tell a queue-mode user to switch to
queue).
2. First tool that runs for >= 30s in the noisiest progress mode
(tool_progress: all) — prints a hint about /verbose to cycle display
modes (all -> new -> off -> verbose). Gated on /verbose actually being
usable on the surface: always shown on CLI; on gateway only shown when
display.tool_progress_command is enabled.
Each hint is latched in config.yaml under onboarding.seen.<flag>, so it
fires exactly once per install across CLI, gateway, and cron, then never
again. Users can wipe the section to re-see hints.
New:
- agent/onboarding.py — is_seen / mark_seen / hint strings, shared by
both CLI and gateway.
- onboarding.seen in DEFAULT_CONFIG (hermes_cli/config.py) and in
load_cli_config defaults (cli.py). No _config_version bump — deep
merge handles new keys.
Wired:
- gateway/run.py: _handle_active_session_busy_message appends the hint
after building the ack. progress_callback tracks tool.completed
duration and queues the tool-progress hint into the progress bubble.
- cli.py: CLI input loop appends the busy-input hint on the first busy
Enter; _on_tool_progress appends the tool-progress hint on the first
>=30s tool completion. In-memory CLI_CONFIG is also updated so
subsequent fires in the same process are suppressed immediately.
All writes go through atomic_yaml_write and are wrapped in try/except
so onboarding can never break the input/busy-ack paths.
OpenRouter and Nous Portal curated picker lists now resolve via a JSON
manifest served by the docs site, falling back to the in-repo snapshot
when unreachable. Lets us update model lists without shipping a release.
Live URL: https://hermes-agent.nousresearch.com/docs/api/model-catalog.json
(source at website/static/api/model-catalog.json; auto-deploys via the
existing deploy-site.yml GitHub Pages pipeline on every merge to main).
Schema (v1) carries id + optional description + free-form metadata at
manifest, provider, and model levels. Pricing and context length stay
live-fetched via existing machinery (/v1/models endpoints, models.dev).
Config (new model_catalog section, default enabled):
model_catalog.url master manifest URL
model_catalog.ttl_hours disk cache TTL (default 24h)
model_catalog.providers.<name>.url optional per-provider override
Fetch pipeline: in-process cache -> disk cache (fresh < TTL) -> HTTP
fetch -> disk-cache-on-failure fallback -> in-repo snapshot as last
resort. Never raises to callers; at worst returns the bundled list.
Changes:
- website/static/api/model-catalog.json initial manifest (35 OR + 31 Nous)
- scripts/build_model_catalog.py regenerator from in-repo lists
- hermes_cli/model_catalog.py fetch + validate + cache module
- hermes_cli/models.py fetch_openrouter_models() +
new get_curated_nous_model_ids()
- hermes_cli/main.py, hermes_cli/auth.py Nous flows use the helper
- hermes_cli/config.py model_catalog defaults
- website/docs/reference/model-catalog.md + sidebars.ts
- tests/hermes_cli/test_model_catalog.py 21 tests (validation, fetch
success/failure, accessors,
disabled, overrides, integration)
Plugin hooks fired after a tool dispatch now receive an integer
duration_ms kwarg measuring how long the tool's registry.dispatch()
call took (time.monotonic() before/after). Inspired by Claude Code
2.1.119 which added the same field to PostToolUse hook inputs.
Wire points:
- model_tools.py: measure dispatch latency, pass duration_ms to
invoke_hook("post_tool_call", ...) and invoke_hook("transform_tool_result", ...)
- hermes_cli/hooks.py: include duration_ms in the synthetic payload
used by 'hermes hooks test' and 'hermes hooks doctor' so shell-hook
authors see the same shape at development time as runtime
- shell hooks (agent/shell_hooks.py): no code change needed;
_serialize_payload already surfaces non-top-level kwargs under
payload['extra'], so duration_ms lands at extra.duration_ms for
shell-hook scripts
Plugin authors can now build latency dashboards, per-tool SLO alerts,
and regression canaries without having to wrap every tool manually.
Test: tests/test_model_tools.py::test_post_tool_call_receives_non_negative_integer_duration_ms
E2E: real PluginManager + dispatch monkey-patched with a 50ms sleep,
hook callback observes duration_ms=50 (int).
Refs: https://code.claude.com/docs/en/changelog (2.1.119, Apr 23 2026)
Bare `hermes setup` on a returning user now drops straight into the
full reconfigure wizard — every prompt shows the current value as its
default, press Enter to keep or type a new value to change it. The
returning-user menu is gone.
Behavior:
- First-time user: first-time wizard (unchanged)
- Returning user, bare command: full reconfigure wizard (new default)
- Returning user, `--quick`: only prompt for missing/unset items
- Returning user, one section: `hermes setup model|terminal|gateway|tools|agent`
- `--reconfigure`: preserved as backwards-compat alias (no-op since it's now default)
The section functions already used current values as prompt defaults —
this change just removes the extra click to get to them.
The 'Quick Setup - configure missing items only' menu option is now
exposed as the explicit `--quick` flag; it's the narrow case of
filling in missing config (e.g. after a partial OpenClaw migration or
when a required API key got cleared).
Inspired by Mercury Agent's `mercury doctor` UX.
Also removes:
- RETURNING_USER_MENU_SECTION_KEYS (orphaned constant)
- Two returning-user menu tests in test_setup_noninteractive.py
(guarding behavior that no longer exists — covered by
test_setup_reconfigure.py instead)
The azure-foundry wizard now probes the endpoint before asking the user
to pick anything by hand:
1. URL path sniff — endpoints ending in /anthropic are Azure Foundry
Claude routes and skip to anthropic_messages.
2. GET <base>/models probe — if the endpoint returns an OpenAI-shaped
model list, we switch to chat_completions and prefill the picker
with the returned deployment/model IDs.
3. Anthropic Messages probe — fallback for endpoints that don't expose
/models but do speak the Anthropic Messages shape.
4. Manual fallback — private endpoints / custom routes still work;
the user picks API mode + types a deployment name.
Context length for the selected model is resolved through the existing
agent.model_metadata.get_model_context_length chain (models.dev,
provider metadata, hardcoded family fallbacks) and stored in
model.context_length when a non-default value is found.
Also refactors runtime_provider so Azure Foundry resolution is reused
between the explicit-credentials path and the default top-level path —
previously the /v1 strip for Anthropic-style Azure only ran when the
caller passed explicit_* args, which meant config-driven sessions
hit a double-/v1 URL.
New module hermes_cli/azure_detect.py with 19 unit tests covering:
- path sniff, model ID extraction, probe fallbacks
- HTTP error handling (URLError, HTTPError)
- context-length lookup passthrough
- DEFAULT_FALLBACK_CONTEXT rejection
New runtime tests cover:
- OpenAI-style Azure Foundry
- Anthropic-style Azure Foundry with /v1 stripping
- Missing base_url / API key raising AuthError
Rationale: Microsoft confirms there's no pure-API-key endpoint to list
Azure deployments (that requires ARM management auth). The v1 Azure
OpenAI endpoint does expose /models with the resource's available
model catalog, which is good enough for picker prefill in the common
case. Users on private/gated endpoints fall through to manual entry.
Add support for Azure Foundry as a new inference provider. Azure Foundry
endpoints can use either OpenAI-style (/v1/chat/completions) or
Anthropic-style (/v1/messages) API formats.
Changes:
- Add azure-foundry to PROVIDER_REGISTRY (auth.py)
- Add azure-foundry overlay in HERMES_OVERLAYS (providers.py)
- Add empty model list for azure-foundry (models.py)
- Add _model_flow_azure_foundry() interactive setup (main.py)
- Add azure-foundry runtime resolution with api_mode support (runtime_provider.py)
- Add AZURE_FOUNDRY_API_KEY and AZURE_FOUNDRY_BASE_URL env vars (config.py)
Usage:
hermes model -> More providers -> Azure Foundry
The setup wizard prompts for:
- Endpoint URL
- API format (OpenAI or Anthropic-style)
- API key
- Model name
Configuration is saved to config.yaml (model.provider, model.base_url,
model.api_mode, model.default) and ~/.hermes/.env (AZURE_FOUNDRY_API_KEY).
Fixes#15779. Custom-provider per-model context_length (`custom_providers[].models.<id>.context_length`) is now honored across every resolution path, not just agent startup. Also adds 256K as the top probe tier and default fallback.
## What changed
New helper `hermes_cli.config.get_custom_provider_context_length()` — single source of truth for the per-model override lookup, with trailing-slash-insensitive base-url matching.
`agent.model_metadata.get_model_context_length()` gains an optional `custom_providers=` kwarg (step 0b — runs after explicit `config_context_length` but before every other probe).
Wired through five call sites that previously either duplicated the lookup or ignored it entirely:
- `run_agent.py` startup — refactored to use the new helper (dedups legacy inline loop, keeps invalid-value warning)
- `AIAgent.switch_model()` — re-reads custom_providers from live config on every /model switch
- `hermes_cli.model_switch.resolve_display_context_length()` — new `custom_providers=` kwarg
- `gateway/run.py` /model confirmation (picker callback + text path)
- `gateway/run.py` `_format_session_info` (/info)
## Context probe tiers
`CONTEXT_PROBE_TIERS = [256_000, 128_000, 64_000, 32_000, 16_000, 8_000]` — was `[128_000, ...]`. `DEFAULT_FALLBACK_CONTEXT` follows tier[0], so unknown models now default to 256K. The stale `128000` literal in the OpenRouter metadata-miss path is replaced with `DEFAULT_FALLBACK_CONTEXT` for consistency.
## Repro (from #15779)
```yaml
custom_providers:
- name: my-custom-endpoint
base_url: https://example.invalid/v1
model: gpt-5.5
models:
gpt-5.5:
context_length: 1050000
```
`/model gpt-5.5 --provider custom:my-custom-endpoint` → previously "Context: 128,000", now "Context: 1,050,000".
## Tests
- `tests/hermes_cli/test_custom_provider_context_length.py` — new file, 19 tests covering the helper, step-0b integration, and the 256K tier invariants
- `tests/hermes_cli/test_model_switch_context_display.py` — added regression tests for #15779 through the display resolver
- `tests/gateway/test_session_info.py` — updated default-fallback assertion (128K → 256K)
- `tests/agent/test_model_metadata.py` — updated tier assertions for the new top tier
The raw-template lookup added in PR #15817 went through
`get_compatible_custom_providers(read_raw_config())`, which calls
`_normalize_custom_provider_entry` → `urlparse(base_url)`. Any
entry whose `base_url` is itself an env-ref (`${NEURALWATT_API_BASE}`)
was dropped as 'not a valid URL', so `api_key_ref` stayed empty and the
resolved secret was still written to `model.api_key` — the exact case
the original Discord report described.
Replace the normalizer-gated lookup with a direct read of
`raw['custom_providers']` and `raw['providers']`, indexed by name
(case-insensitive, optionally qualified by model) so the loaded
(expanded) entry can be matched regardless of how `base_url` is
written.
Add an integration regression test driving the real
`select_provider_and_model` entry point with the Discord-reported
NeuralWatt config (`${VAR}` in both `base_url` and `api_key`).
This test fails on the PR-only fix and passes with the broadened
lookup.
When switching models on a custom endpoint (ollama-launch):
- Same-provider switches no longer re-resolve credentials (fixes base_url
being lost for 'custom' provider on subsequent switches)
- Named providers (ollama-launch) are resolved via user_providers so
switch_model can find their base_url from config
- Models not in the /v1/models probe but present in the user's saved
provider config are accepted with a warning instead of rejected
- CLI /model and TUI /model both pass user_providers/custom_providers
to switch_model so the config model list is available for validation
Closes#15088
- expand short model aliases like sonnet/opus via static catalogs during startup runtime resolution
- keep startup alias resolution network-free and add regression tests in models and tui gateway suites
The post-graceful-drain is-active poll used a fixed 10s timeout, but
systemd's hermes-gateway.service has RestartSec=30 — so systemd won't
respawn the unit for 30s after exit-75, and our poll gives up during
the cooldown. Result: every 'hermes update' printed
⚠ hermes-gateway drained but didn't relaunch — forcing restart
followed by a redundant 'systemctl restart' that kicked the newly-
respawning gateway again (and re-started WhatsApp / Discord a second
time in the process).
Fix: read RestartUSec from the unit via 'systemctl show' and set the
poll budget to max(10s, RestartSec + 10s slack). Units without
RestartSec set (or value=infinity) fall back to the original 10s.
Observed timeline from journalctl before fix:
08:56:22.262 old PID exits 75
08:56:32.707 systemd logs Stopped -> Started (10.4s gap, > 10s budget)
After fix the poll covers 40s — comfortably inside RestartSec + slack.
Validation:
- RestartUSec parser tested against '30s', '100ms', '1min 30s',
'infinity', '', 'garbage', '500us', '2min' — all correct.
- Against the live hermes-gateway.service: parses to 30.0s.
- tests/hermes_cli/test_update_gateway_restart.py: 41/41 pass.
Makes hermes -z usable by sweeper without mutating user config.
- Top-level -m/--model and --provider flags that apply to -z/--oneshot
(mirrors hermes chat's plumbing).
- HERMES_INFERENCE_MODEL env var as the parallel to HERMES_INFERENCE_PROVIDER
for CI / scripted invocations.
- resolve_runtime_provider() gets the requested provider; when --model is
given without --provider, detect_provider_for_model() auto-selects the
provider that serves it (same semantic as /model in an interactive session).
- --provider without --model errors out with exit 2 — carrying a config
model across to a different provider is usually wrong, and silently
picking the provider's catalog default hides the mismatch.
Config defaults still used when both flags are omitted (existing behavior).
Validation (all live against OpenRouter):
-z 'x' ....................... uses config default (opus-4.7)
-z 'x' --model haiku-4.5 ..... haiku-4.5 via auto-detected openrouter
-z 'x' --model ... --provider pair as given
HERMES_INFERENCE_MODEL=... -z haiku-4.5 via env var
-z 'x' --provider anthropic .. exits 2 with error to stderr
* feat: add `hermes -z <prompt>` one-shot mode
Top-level flag that runs a single prompt and prints ONLY the final
response text to stdout. No banner, no spinner, no tool previews, no
session_id line — stdout is machine-readable, stderr is silent.
Tools, memory, rules, and AGENTS.md in the CWD are loaded as normal.
Approvals are auto-bypassed (sets HERMES_YOLO_MODE=1 for the call).
Bypasses cli.py entirely — goes straight to AIAgent.chat().
* feat(oneshot): handle interactive-callback gaps explicitly
Document (and where needed, patch) the interactive surfaces that have
no user to answer in oneshot mode:
- clarify — inject a callback that tells the agent to pick the
best default and continue (previously returned a
generic 'not available in this execution context'
error that wastes a tool call)
- sudo password — terminal_tool already gates on HERMES_INTERACTIVE
(we don't set it); sudo fails gracefully
- shell hooks — HERMES_ACCEPT_HOOKS=1 auto-approves; also falls
back to deny on non-tty stdin
- dangerous cmd — HERMES_YOLO_MODE=1 short-circuits before input()
- secret capture— tool returns gracefully when no callback wired
Live-tested: agent asked clarify(['red','blue']) and got 'red' back,
replied with only 'red'.
The AIAgent.flush_memories pre-compression save, the gateway
_flush_memories_for_session, and everything feeding them are
obsolete now that the background memory/skill review handles
persistent memory extraction.
Problems with flush_memories:
- Pre-dates the background review loop. It was the only memory-save
path when introduced; the background review now fires every 10 user
turns on CLI and gateway alike, which is far more frequent than
compression or session reset ever triggered flush.
- Blocking and synchronous. Pre-compression flush ran on the live agent
before compression, blocking the user-visible response.
- Cache-breaking. Flush built a temporary conversation prefix
(system prompt + memory-only tool list) that diverged from the live
conversation's cached prefix, invalidating prompt caching. The
gateway variant spawned a fresh AIAgent with its own clean prompt
for each finalized session — still cache-breaking, just in a
different process.
- Redundant. Background review runs in the live conversation's
session context, gets the same content, writes to the same memory
store, and doesn't break the cache. Everything flush_memories
claimed to preserve is already covered.
What this removes:
- AIAgent.flush_memories() method (~248 LOC in run_agent.py)
- Pre-compression flush call in _compress_context
- flush_memories call sites in cli.py (/new + exit)
- GatewayRunner._flush_memories_for_session + _async_flush_memories
(and the 3 call sites: session expiry watcher, /new, /resume)
- 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks,
hermes tools UI task list, auxiliary_client docstrings
- _memory_flush_min_turns config + init
- #15631's headroom-deduction math in
_check_compression_model_feasibility (headroom was only needed
because flush dragged the full main-agent system prompt along;
the compression summariser sends a single user-role prompt so
new_threshold = aux_context is safe again)
- The dedicated test files and assertions that exercised
flush-specific paths
What this renames (with read-time backcompat on sessions.json):
- SessionEntry.memory_flushed -> SessionEntry.expiry_finalized.
The session-expiry watcher still uses the flag to avoid re-running
finalize/eviction on the same expired session; the new name
reflects what it now actually gates. from_dict() reads
'expiry_finalized' first, falls back to the legacy 'memory_flushed'
key so existing sessions.json files upgrade seamlessly.
Supersedes #15631 and #15638.
Tested: 383 targeted tests pass across run_agent/, agent/, cli/,
and gateway/ session-boundary suites. No behavior regressions —
background memory review continues to handle persistent memory
extraction on both CLI and gateway.
The auto-restart path in `hermes update` verifies systemd unit health with
`time.sleep(3)` + a single `systemctl is-active` call. The unit's
Stopped -> Started transition after a graceful SIGUSR1 exit (or a hard
restart) is not always complete inside that 3s window, so the verify
races and reports 'drained but didn't relaunch' even though systemd is
about to bring the unit back up a fraction of a second later. Users
then see a spurious warning, a redundant fallback `systemctl restart`
fires, and adapters (Discord, WhatsApp) get restarted twice.
Replace the three sleep+oneshot sites with a small `_wait_for_service_active()`
closure that polls `is-active` every 0.5s for up to 10s. Behaviour
is unchanged when the unit is healthy or truly dead — only the race
window around a clean restart is now handled correctly.
Tests: tests/hermes_cli/test_update_gateway_restart.py (41/41).
`hermes tools` → "reconfigure existing" listed Spotify twice because
the Apr 24 refactor that moved Spotify into plugins/spotify/ (PR #15174)
left the entry in CONFIGURABLE_TOOLSETS. _get_effective_configurable_toolsets()
unconditionally appended get_plugin_toolsets() on top, so the same
'spotify' key showed up from both sources.
Dedupe by key — built-in CONFIGURABLE_TOOLSETS entry wins (it has the
nicer label and description). Also guards against future bundled plugins
that share a toolset key with a built-in.
Both discord (read/participate) and discord_admin (server admin) are now
configurable via `hermes tools` with default-OFF. Previously the core
discord tool (fetch_messages, search_members, create_thread) auto-loaded
on every Discord install with DISCORD_BOT_TOKEN set — 19 tools the user
never opted into.
Adds a platform-scoping mechanism (_TOOLSET_PLATFORM_RESTRICTIONS) so
the discord toolsets only show up in the Discord platform's checklist,
not on CLI/Telegram/Slack/etc. Applied at four gates:
- _prompt_toolset_checklist: checklist filter
- _get_platform_tools: resolution filter (both branches)
- _save_platform_tools: save-time filter (covers 'Configure all
platforms' and hand-edited config.yaml)
- tools_disable_enable_command: rejects `hermes tools enable discord`
on non-Discord platforms with a clear error
build_session_context_prompt now injects the Discord IDs block only
when both conditions hold: the discord/discord_admin toolset is
enabled AND DISCORD_BOT_TOKEN is set. Toolset alone isn't enough —
the tool's check_fn gates on the token at registry time, so opting
in without a token yields no tools and the IDs block would lie.
Otherwise keep the stale-API disclaimer.
Split the monolithic discord_server tool (14 actions) into two:
- discord: core actions (fetch_messages, search_members, create_thread)
that are useful for the agent's normal operation. Auto-enabled on
the discord platform via the pipeline fix.
- discord_admin: server management actions (list channels/roles, pins,
role assignment) that require explicit opt-in via hermes tools.
Added to CONFIGURABLE_TOOLSETS and _DEFAULT_OFF_TOOLSETS.
The reverse-mapping loop in _get_platform_tools only checked
CONFIGURABLE_TOOLSETS, silently dropping platform-specific toolsets
like discord and feishu_doc whose tools were in the composite but
had no configurable key. Add a second pass over TOOLSETS that picks
up unclaimed toolsets whose tools are present in the resolved
composite.
YAML parses bare numeric toolset names (e.g. 12306:) as int, causing
TypeError in sorted() since the read path normalizes to str but the
save path did not.
The no_mcp sentinel was preserved in existing entries even when the
user re-enabled MCP servers, causing MCP to stay silently disabled.
Subagents run inside a ThreadPoolExecutor. The CLI's interactive approval
callback lives in tools/terminal_tool.py's threading.local(), which worker
threads do not inherit. When a subagent hits a dangerous-command guard,
prompt_dangerous_approval() falls back to input() from the worker thread,
deadlocking against the parent's prompt_toolkit TUI that owns stdin.
Fix: install a non-interactive callback into every subagent worker thread
via ThreadPoolExecutor(initializer=set_approval_callback, initargs=(cb,)).
The callback is config-gated by delegation.subagent_auto_approve:
false (default) -> _subagent_auto_deny (safe; matches leaf tool blocklist)
true -> _subagent_auto_approve (opt-in YOLO for cron/batch)
Both emit a logger.warning audit line. Gateway sessions are unaffected
because they resolve approvals via tools/approval.py's per-session queue,
not through these TLS callbacks. Diagnosis credit: @MorAlekss (#14685).
- hermes_cli/config.py: DEFAULT_CONFIG.delegation.subagent_auto_approve: False
- cli-config.yaml.example: documented, commented (default)
- tools/delegate_tool.py: _subagent_auto_deny, _subagent_auto_approve,
_get_subagent_approval_callback, wired into the child timeout executor
- tests/tools/test_delegate.py: 7 tests covering defaults, truthy coercion,
and TLS scoping in the worker thread
/model gpt-5.5 on openai-codex showed 'Context: 1,050,000 tokens' because
the display block used ModelInfo.context_window directly from models.dev.
Codex OAuth actually enforces 272K for the same slug, and the agent's
compressor already runs at 272K via get_model_context_length() — so the
banner + real context budget said 272K while /model lied with 1M.
Route the display context through a new resolve_display_context_length()
helper that always prefers agent.model_metadata.get_model_context_length
(which knows about Codex OAuth, Copilot, Nous caps) and only falls back
to models.dev when that returns nothing.
Fix applied to all 3 /model display sites:
cli.py _handle_model_switch
gateway/run.py picker on_model_selected callback
gateway/run.py text-fallback confirmation
Reported by @emilstridell (Telegram, April 2026).
_load_auth_store() caught all parse/read exceptions and silently
returned an empty store, making corruption look like a logout with
no diagnostic information and no way to recover the original file.
Now copies the corrupt file to auth.json.corrupt before resetting,
and logs a warning with the exception and backup path.
_submit_anthropic_pkce() retrieved sess under _oauth_sessions_lock but
wrote back to sess["status"] and sess["error_message"] outside the lock.
A concurrent session GC or cancel could race with these writes, producing
inconsistent session state.
Wrap all 4 sess write sites in _oauth_sessions_lock:
- network exception path (Token exchange failed)
- missing access_token path
- credential save failure path
- success path (approved)
The web UI schema advertised 'block' as a busy_input_mode choice, but
no implementation ever existed — the gateway and CLI both silently
collapsed 'block' (and anything other than 'queue') to 'interrupt'.
Users who picked 'block' in the dashboard got interrupts anyway.
Drop 'block' from the select options. The two supported modes are
'interrupt' (default) and 'queue'.
Replaces gpt-5.4 / gpt-5.4-pro entries in the OpenRouter fallback snapshot
and the Nous Portal curated list. Other aggregators (Vercel AI Gateway)
and provider-native lists are unchanged.
Exposes hermes --tui over a PTY-backed WebSocket so the dashboard can
embed the real TUI rather than reimplement its surface. The browser
attaches xterm.js to the socket; keystrokes flow in, PTY output bytes
flow out.
Architecture:
browser <Terminal> (xterm.js)
│ onData ───► ws.send(keystrokes)
│ onResize ► ws.send('\x1b[RESIZE:cols;rows]')
│ write ◄── ws.onmessage (PTY bytes)
▼
FastAPI /api/pty (token-gated, loopback-only)
▼
PtyBridge (ptyprocess) ── spawns node ui-tui/dist/entry.js ──► tui_gateway + AIAgent
Components
----------
hermes_cli/pty_bridge.py
Thin wrapper around ptyprocess.PtyProcess: byte-safe read/write on the
master fd via os.read/os.write (not PtyProcessUnicode — ANSI is
inherently byte-oriented and UTF-8 boundaries may land mid-read),
non-blocking select-based reads, TIOCSWINSZ resize, idempotent
SIGHUP→SIGTERM→SIGKILL teardown, platform guard (POSIX-only; Windows
is WSL-supported only).
hermes_cli/web_server.py
@app.websocket("/api/pty") endpoint gated by the existing
_SESSION_TOKEN (via ?token= query param since browsers can't set
Authorization on WS upgrades). Loopback-only enforcement. Reader task
uses run_in_executor to pump PTY bytes without blocking the event
loop. Writer loop intercepts a custom \x1b[RESIZE:cols;rows] escape
before forwarding to the PTY. The endpoint resolves the TUI argv
through a _resolve_chat_argv hook so tests can inject fake commands
without building the real TUI.
Tests
-----
tests/hermes_cli/test_pty_bridge.py — 12 unit tests: spawn, stdout,
stdin round-trip, EOF, resize (via TIOCSWINSZ + tput readback), close
idempotency, cwd, env forwarding, unavailable-platform error.
tests/hermes_cli/test_web_server.py — TestPtyWebSocket adds 7 tests:
missing/bad token rejection (close code 4401), stdout streaming,
stdin round-trip, resize escape forwarding, unavailable-platform ANSI
error frame + 1011 close, resume parameter forwarding to argv.
96 tests pass under scripts/run_tests.sh.
(cherry picked from commit 29b337bca70fc9efb082a5a852ea2cd5381af1a9)
feat(web): add Chat tab with xterm.js terminal + Sessions resume button
(cherry picked from commit 3d21aee8 by emozilla, conflicts resolved
against current main: BUILTIN_ROUTES table + plugin slot layout)
fix(tui): replace OSC 52 jargon in /copy confirmation
When the user ran /copy successfully, Ink confirmed with:
sent OSC52 copy sequence (terminal support required)
That reads like a protocol spec to everyone who isn't a terminal
implementer. The caveat was a historical artifact — OSC 52 wasn't
universally supported when this message was written, so the TUI
honestly couldn't guarantee the copy had landed anywhere.
Today every modern terminal (including the dashboard's embedded
xterm.js) handles OSC 52 reliably. Say what the user actually wants
to know — that it copied, and how much — matching the message the
TUI already uses for selection copy:
copied 1482 chars
(cherry picked from commit a0701b1d5a598dd1d3b94038a7bcbb2a3ab559fc)
docs: document the dashboard Chat tab
AGENTS.md — new subsection under TUI Architecture explaining that the
dashboard embeds the real hermes --tui rather than rewriting it,
with pointers to the pty_bridge + WebSocket endpoint and the rule
'never add a parallel chat surface in React.'
website/docs/user-guide/features/web-dashboard.md — user-facing Chat
section inside the existing Web Dashboard page, covering how it works
(WebSocket + PTY + xterm.js), the Sessions-page resume flow, and
prerequisites (Node.js, ptyprocess, POSIX kernel / WSL on Windows).
(cherry picked from commit 2c2e32cc4519973c77b63016316b065c0f656704)
feat(tui-gateway): transport-aware dispatch + WebSocket sidecar
Decouples the JSON-RPC dispatcher from its I/O sink so the same handler
surface can drive multiple transports concurrently. The PTY chat tab
already speaks to the TUI binary as bytes — this adds a structured
event channel alongside it for dashboard-side React widgets that need
typed events (tool.start/complete, model picker state, slash catalog)
that PTY can't surface.
- `tui_gateway/transport.py` — `Transport` protocol + `contextvars` binding
+ module-level `StdioTransport` fallback. The stdio stream resolves
through a lambda so existing tests that monkey-patch `_real_stdout`
keep passing without modification.
- `tui_gateway/ws.py` — WebSocket transport implementation; FastAPI
endpoint mounting lives in hermes_cli/web_server.py.
- `tui_gateway/server.py`:
- `write_json` routes via session transport (for async events) →
contextvar transport (for in-request writes) → stdio fallback.
- `dispatch(req, transport=None)` binds the transport for the request
lifetime and propagates it to pool workers via `contextvars.copy_context`
so async handlers don't lose their sink.
- `_init_session` and the manual-session create path stash the
request's transport so out-of-band events (subagent.complete, etc.)
fan out to the right peer.
`tui_gateway.entry` (Ink's stdio handshake) is unchanged externally —
it falls through every precedence step into the stdio fallback, byte-
identical to the previous behaviour.
feat(web): ChatSidebar — JSON-RPC sidecar next to xterm.js terminal
Composes the two transports into a single Chat tab:
┌─────────────────────────────────────────┬──────────────┐
│ xterm.js / PTY (emozilla #13379) │ ChatSidebar │
│ the literal hermes --tui process │ /api/ws │
└─────────────────────────────────────────┴──────────────┘
terminal bytes structured events
The terminal pane stays the canonical chat surface — full TUI fidelity,
slash commands, model picker, mouse, skin engine, wide chars all paint
inside the terminal. The sidebar opens a parallel JSON-RPC WebSocket
to the same gateway and renders metadata that PTY can't surface to
React chrome:
• model + provider badge with connection state (click → switch)
• running tool-call list (driven by tool.start / tool.progress /
tool.complete events)
• model picker dialog (gateway-driven, reuses ModelPickerDialog)
The sidecar is best-effort. If the WS can't connect (older gateway,
network hiccup, missing token) the terminal pane keeps working
unimpaired — sidebar just shows the connection-state badge in the
appropriate tone.
- `web/src/components/ChatSidebar.tsx` — new component (~270 lines).
Owns its GatewayClient, drives the model picker through
`slash.exec`, fans tool events into a capped tool list.
- `web/src/pages/ChatPage.tsx` — split layout: terminal pane
(`flex-1`) + sidebar (`w-80`, `lg+` only).
- `hermes_cli/web_server.py` — mount `/api/ws` (token + loopback
guards mirror /api/pty), delegate to `tui_gateway.ws.handle_ws`.
Co-authored-by: emozilla <emozilla@nousresearch.com>
refactor(web): /clean pass on ChatSidebar + ChatPage lint debt
- ChatSidebar: lift gw out of useRef into a useMemo derived from a
reconnect counter. React 19's react-hooks/refs and react-hooks/
set-state-in-effect rules both fire when you touch a ref during
render or call setState from inside a useEffect body. The
counter-derived gw is the canonical pattern for "external resource
that needs to be replaceable on user action" — re-creating the
client comes from bumping `version`, the effect just wires + tears
down. Drops the imperative `gwRef.current = …` reassign in
reconnect, drops the truthy ref guard in JSX. modelLabel +
banner inlined as derived locals (one-off useMemo was overkill).
- ChatPage: lazy-init the banner state from the missing-token check
so the effect body doesn't have to setState on first run. Drops
the unused react-hooks/exhaustive-deps eslint-disable. Adds a
scoped no-control-regex disable on the SGR mouse parser regex
(the \\x1b is intentional for xterm escape sequences).
All my-touched files now lint clean. Remaining warnings on web/
belong to pre-existing files this PR doesn't touch.
Verified: vitest 249/249, ui-tui eslint clean, web tsc clean,
python imports clean.
chore: uptick
fix(web): drop ChatSidebar tool list — events can't cross PTY/WS boundary
The /api/pty endpoint spawns `hermes --tui` as a child process with its
own tui_gateway and _sessions dict; /api/ws runs handle_ws in-process in
the dashboard server with a separate _sessions dict. Tool events fire on
the child's gateway and never reach the WS sidecar, so the sidebar's
tool.start/progress/complete listeners always observed an empty list.
Drop the misleading list (and the now-orphaned ToolCall primitive),
keep model badge + connection state + model picker + error banner —
those work because they're sidecar-local concerns. Surfacing tool calls
in the sidebar requires cross-process forwarding (PTY child opens a
back-WS to the dashboard, gateway tees emits onto stdio + sidecar
transport) — proper feature for a follow-up.
feat(web): wire ChatSidebar tool list to PTY child via /api/pub broadcast
The dashboard's /api/pty spawns hermes --tui as a child process; tool
events fire in the python tui_gateway grandchild and never crossed the
process boundary into the in-process WS sidecar — so the sidebar tool
list was always empty.
Cross-process forwarding:
- tui_gateway: TeeTransport (transport.py) + WsPublisherTransport
(event_publisher.py, sync websockets client). entry.py installs the
tee on _stdio_transport when HERMES_TUI_SIDECAR_URL is set, mirroring
every dispatcher emit to a back-WS without disturbing Ink's stdio
handshake.
- hermes_cli/web_server.py: new /api/pub (publisher) + /api/events
(subscriber) endpoints with a per-channel registry. /api/pty now
accepts ?channel= and propagates the sidecar URL via env. start_server
also stashes app.state.bound_port so the URL is constructable.
- web/src/pages/ChatPage.tsx: generates a channel UUID per mount,
passes it to /api/pty and as a prop to ChatSidebar.
- web/src/components/ChatSidebar.tsx: opens /api/events?channel=, fans
tool.start/progress/complete back into the ToolCall list. Restores
the ToolCall primitive.
Tests: 4 new TestPtyWebSocket cases cover channel propagation,
broadcast fan-out, and missing-channel rejection (10 PTY tests pass,
120 web_server tests overall).
fix(web): address Copilot review on #14890
Five threads, all real:
- gatewayClient.ts: register `message`/`close` listeners BEFORE awaiting
the open handshake. Server emits `gateway.ready` immediately after
accept, so a listener attached after the open promise could race past
the initial skin payload and lose it.
- ChatSidebar.tsx: wire `error`/`close` on the /api/events subscriber
WS into the existing error banner. 4401/4403 (auth/loopback reject)
surface as a "reload the page" message; mid-stream drops surface as
"events feed disconnected" with the existing reconnect button. Clean
unmount closes (1000/1001) stay silent.
- web-dashboard.md: install hint was `pip install hermes-agent[web]` but
ptyprocess lives in the `pty` extra, not `web`. Switch to
`hermes-agent[web,pty]` in both prerequisite blocks.
- AGENTS.md: previous "never add a parallel React chat surface" guidance
was overbroad and contradicted this PR's sidebar. Tightened to forbid
re-implementing the transcript/composer/PTY terminal while explicitly
allowing structured supporting widgets (sidebar / model picker /
inspectors), matching the actual architecture.
- web/package-lock.json: regenerated cleanly so the wterm sibling
workspace paths (extraneous machine-local entries) stop polluting CI.
Tests: 249/249 vitest, 10/10 PTY/events, web tsc clean.
refactor(web): /clean pass on ChatSidebar events handler
Spotted in the round-2 review:
- Banner flashed on clean unmount: `ws.close()` from the effect cleanup
fires `close` with code 1005, opened=true, neither 1000 nor 1001 —
hit the "unexpected drop" branch. Track `unmounting` in the effect
scope and gate the banner through a `surface()` helper so cleanup
closes stay silent.
- DRY the duplicated "events feed disconnected" string into a local
const used by both the error and close handlers.
- Drop the `opened` flag (no longer needed once the unmount guard is
the source of truth for "is this an expected close?").
A — 'hermes tools' activation now runs the full Spotify wizard.
Previously a user had to (1) toggle the Spotify toolset on in 'hermes
tools' AND (2) separately run 'hermes auth spotify' to actually use
it. The second step was a discovery gap — the docs mentioned it but
nothing in the TUI pointed users there.
Now toggling Spotify on calls login_spotify_command as a post_setup
hook. If the user has no client_id yet, the interactive wizard walks
them through Spotify app creation; if they do, it skips straight to
PKCE. Either way, one 'hermes tools' pass leaves Spotify toggled on
AND authenticated. SystemExit from the wizard (user abort) leaves the
toolset enabled and prints a 'run: hermes auth spotify' hint — it
does NOT fail the toolset toggle.
Dropped the TOOL_CATEGORIES env_vars list for Spotify. The wizard
handles HERMES_SPOTIFY_CLIENT_ID persistence itself, and asking users
to type env var names before the wizard fires was UX-backwards — the
point of the wizard is that they don't HAVE a client_id yet.
B — Docs page now covers cron + Spotify.
New 'Scheduling: Spotify + cron' section with two working examples
(morning playlist, wind-down pause) using the real 'hermes cron add'
CLI surface (verified via 'cron add --help'). Covers the active-device
gotcha, Premium gating, memory isolation, and links to the cron docs.
Also fixed a stale '9 Spotify tools' reference in the setup copy —
we consolidated to 7 tools in #15154.
Validation:
- scripts/run_tests.sh tests/hermes_cli/test_tools_config.py
tests/hermes_cli/test_spotify_auth.py
tests/tools/test_spotify_client.py
→ 54 passed
- website: node scripts/prebuild.mjs && npx docusaurus build
→ SUCCESS, no new warnings
Bug 3 — Stale OAuth token not detected in 'hermes model':
- _model_flow_anthropic used 'has_creds = bool(existing_key)' which treats
any non-empty token (including expired OAuth tokens) as valid.
- Added existing_is_stale_oauth check: if the only credential is an OAuth
token (sk-ant- prefix) with no valid cc_creds fallback, mark it stale
and force the re-auth menu instead of silently accepting a broken token.
Bug 4 — macOS Keychain credentials never read:
- Claude Code >=2.1.114 migrated from ~/.claude/.credentials.json to the
macOS Keychain under service 'Claude Code-credentials'.
- Added _read_claude_code_credentials_from_keychain() using the 'security'
CLI tool; read_claude_code_credentials() now tries Keychain first then
falls back to JSON file.
- Non-Darwin platforms return None from Keychain read immediately.
Tests:
- tests/agent/test_anthropic_keychain.py: 11 cases covering Darwin-only
guard, security command failures, JSON parsing, fallback priority.
- tests/hermes_cli/test_anthropic_model_flow_stale_oauth.py: 8 cases
covering stale OAuth detection, API key passthrough, cc_creds fallback.
Refs: #12905
Moves the Spotify integration from tools/ into plugins/spotify/,
matching the existing pattern established by plugins/image_gen/ for
third-party service integrations.
Why:
- tools/ should be reserved for foundational capabilities (terminal,
read_file, web_search, etc.). tools/providers/ was a one-off
directory created solely for spotify_client.py.
- plugins/ is already the home for image_gen backends, memory
providers, context engines, and standalone hook-based plugins.
Spotify is a third-party service integration and belongs alongside
those, not in tools/.
- Future service integrations (eventually: Deezer, Apple Music, etc.)
now have a pattern to copy.
Changes:
- tools/spotify_tool.py → plugins/spotify/tools.py (handlers + schemas)
- tools/providers/spotify_client.py → plugins/spotify/client.py
- tools/providers/ removed (was only used for Spotify)
- New plugins/spotify/__init__.py with register(ctx) calling
ctx.register_tool() × 7. The handler/check_fn wiring is unchanged.
- New plugins/spotify/plugin.yaml (kind: backend, bundled, auto-load).
- tests/tools/test_spotify_client.py: import paths updated.
tools_config fix — _DEFAULT_OFF_TOOLSETS now wins over plugin auto-enable:
- _get_platform_tools() previously auto-enabled unknown plugin
toolsets for new platforms. That was fine for image_gen (which has
no toolset of its own) but bad for Spotify, which explicitly
requires opt-in (don't ship 7 tool schemas to users who don't use
it). Added a check: if a plugin toolset is in _DEFAULT_OFF_TOOLSETS,
it stays off until the user picks it in 'hermes tools'.
Pre-existing test bug fix:
- tests/hermes_cli/test_plugins.py::test_list_returns_sorted
asserted names were sorted, but list_plugins() sorts by key
(path-derived, e.g. image_gen/openai). With only image_gen plugins
bundled, name and key order happened to agree. Adding plugins/spotify
broke that coincidence (spotify sorts between openai-codex and xai
by name but after xai by key). Updated test to assert key order,
which is what the code actually documents.
Validation:
- scripts/run_tests.sh tests/hermes_cli/test_plugins.py \
tests/hermes_cli/test_tools_config.py \
tests/hermes_cli/test_spotify_auth.py \
tests/tools/test_spotify_client.py \
tests/tools/test_registry.py
→ 143 passed
- E2E plugin load: 'spotify' appears in loaded plugins, all 7 tools
register into the spotify toolset, check_fn gating intact.
Three quality improvements on top of #15121 / #15130 / #15135:
1. Tool consolidation (9 → 7)
- spotify_saved_tracks + spotify_saved_albums → spotify_library with
kind='tracks'|'albums'. Handler code was ~90 percent identical
across the two old tools; the merge is a behavioral no-op.
- spotify_activity dropped. Its 'now_playing' action was a duplicate
of spotify_playback.get_currently_playing (both return identical
204/empty payloads). Its 'recently_played' action moves onto
spotify_playback as a new action — history belongs adjacent to
live state.
- Net: each API call ships 2 fewer tool schemas when the Spotify
toolset is enabled, and the action surface is more discoverable
(everything playback-related is on one tool).
2. Spotify skill (skills/media/spotify/SKILL.md)
Teaches the agent canonical usage patterns so common requests don't
balloon into 4+ tool calls:
- 'play X' = one search, then play by URI (not search + scan +
describe + play)
- 'what's playing' = single get_currently_playing (no preflight
get_state chain)
- Don't retry on '403 Premium required' or '403 No active device' —
both require user action
- URI/URL/bare-ID format normalization
- Full failure-mode reference for 204/401/403/429
3. Surfaced in 'hermes setup' tool status
Adds 'Spotify (PKCE OAuth)' to the tool status list when
auth.json has a Spotify access/refresh token. Matches the
homeassistant pattern but reads from auth.json (OAuth-based) rather
than env vars.
Docs updated to reflect the new 7-tool surface, and mention the
companion skill in the 'Using it' section.
Tests: 54 passing (client 22, auth 15, tools_config 35 — 18 = 54 after
renaming/replacing the spotify_activity tests with library +
recently_played coverage). Docusaurus build clean.
Salvage of the Gemini-specific piece from PR #12585 by @briandevans.
Gemini's OpenAI-compat /v1beta/openai/models endpoint returns IDs prefixed
with 'models/' (native Gemini-API convention), so set-membership against
curated bare IDs drops every model. Strip the prefix before comparison.
The Anthropic static-catalog piece of #12585 was subsumed by #12618's
_fetch_anthropic_models() branch landing earlier in the same salvage PR.
Full branch cherry-pick was skipped because it also carried unrelated
catalog-version regressions.
The generic /v1/models probe in validate_requested_model() sent a plain
'Authorization: Bearer <key>' header, which works for OpenAI-compatible
endpoints but results in a 401 Unauthorized from Anthropic's API.
Anthropic requires x-api-key + anthropic-version headers (or Bearer for
OAuth tokens from Claude Code).
Add a provider-specific branch for normalized == 'anthropic' that calls
the existing _fetch_anthropic_models() helper, which already handles
both regular API keys and Claude Code OAuth tokens correctly. This
mirrors the pattern already used for openai-codex, copilot, and bedrock.
The branch also includes:
- fuzzy auto-correct (cutoff 0.9) for near-exact model ID typos
- fuzzy suggestions (cutoff 0.5) when the model is not listed
- graceful fall-through when the token cannot be resolved or the
network is unreachable (accepts with a warning rather than hard-fail)
- a note that newer/preview/snapshot model IDs can be gate-listed
and may still work even if not returned by /v1/models
Fixes Anthropic provider users seeing 'service unreachable' errors
when running /model <claude-model> because every probe 401'd.
- probe_api_models: add api_mode param; use x-api-key + anthropic-version
headers for anthropic_messages mode (Anthropic's native Models API auth)
- probe_api_models: add User-Agent header to avoid Cloudflare 403 blocks
on third-party OpenAI-compatible endpoints
- validate_requested_model: pass api_mode through from switch_model
- validate_requested_model: for anthropic_messages mode, attempt probe with
correct auth; if probe fails (many proxies don't implement /v1/models),
accept the model with an informational warning instead of rejecting
- fetch_api_models: propagate api_mode to probe_api_models
Two small fixes triggered by a support report where the user saw a
cryptic 'HTTP 400 - Error 400 (Bad Request)!!1' (Google's GFE HTML
error page, not a real API error) on every gemini-2.5-pro request.
The underlying cause was an empty GOOGLE_API_KEY / GEMINI_API_KEY, but
nothing in our output made that diagnosable:
1. hermes_cli/dump.py: the api_keys section enumerated 23 providers but
omitted Google entirely, so users had no way to verify from 'hermes
dump' whether the key was set. Added GOOGLE_API_KEY and GEMINI_API_KEY
rows.
2. agent/gemini_native_adapter.py: GeminiNativeClient.__init__ accepted
an empty/whitespace api_key and stamped it into the x-goog-api-key
header, which made Google's frontend return a generic HTML 400 long
before the request reached the Generative Language backend. Now we
raise RuntimeError at construction with an actionable message
pointing at GOOGLE_API_KEY/GEMINI_API_KEY and aistudio.google.com.
Added a regression test that covers '', ' ', and None.
Previously 'hermes auth spotify' crashed with 'HERMES_SPOTIFY_CLIENT_ID
is required' if the user hadn't manually created a Spotify developer
app and set env vars. Now the command detects a missing client_id and
walks the user through the one-time app registration inline:
- Opens https://developer.spotify.com/dashboard in the browser
- Tells the user exactly what to paste into the Spotify form
(including the correct default redirect URI, 127.0.0.1:43827)
- Prompts for the Client ID
- Persists HERMES_SPOTIFY_CLIENT_ID to ~/.hermes/.env so subsequent
runs skip the wizard
- Continues straight into the PKCE OAuth flow
Also prints the docs URL at both the start of the wizard and the end
of a successful login so users can find the full guide.
Adds website/docs/user-guide/features/spotify.md with the complete
setup walkthrough, tool reference, and troubleshooting, and wires it
into the sidebar under User Guide > Features > Advanced.
Fixes a stale redirect URI default in the hermes_cli/tools_config.py
TOOL_CATEGORIES entry (was 8888/callback from the PR description
instead of the actual DEFAULT_SPOTIFY_REDIRECT_URI value
43827/spotify/callback defined in auth.py).
`_normalize_for_deepseek` was mapping every non-reasoner input into
`deepseek-chat` on the assumption that DeepSeek's API accepts only two
model IDs. That assumption no longer holds — `deepseek-v4-pro` and
`deepseek-v4-flash` are first-class IDs accepted by the direct API,
and on aggregators `deepseek-chat` routes explicitly to V3 (DeepInfra
backend returns `deepseek-chat-v3`). So a user picking V4 Pro through
the model picker was being silently downgraded to V3.
Verified 2026-04-24 against Nous portal's OpenAI-compat surface:
- `deepseek/deepseek-v4-flash` → provider: DeepSeek,
model: deepseek-v4-flash-20260423
- `deepseek/deepseek-chat` → provider: DeepInfra,
model: deepseek/deepseek-chat-v3
Fix:
- Add `deepseek-v4-pro` and `deepseek-v4-flash` to
`_DEEPSEEK_CANONICAL_MODELS` so exact matches pass through.
- Add `_DEEPSEEK_V_SERIES_RE` (`^deepseek-v\d+(...)?$`) so future
V-series IDs (`deepseek-v5-*`, dated variants) keep passing through
without another code change.
- Update docstring + module header to reflect the new rule.
Tests:
- New `TestDeepseekVSeriesPassThrough` — 8 parametrized cases covering
bare, vendor-prefixed, case-variant, dated, and future V-series IDs
plus end-to-end `normalize_model_for_provider(..., "deepseek")`.
- New `TestDeepseekCanonicalAndReasonerMapping` — regression coverage
for canonical pass-through, reasoner-keyword folding, and
fall-back-to-chat behaviour.
- 77/77 pass.
Reported on Discord (Ufonik, Don Piedro): `/model > Deepseek >
deepseek-v4-pro` surfaced
`Normalized 'deepseek-v4-pro' to 'deepseek-chat'`. Picker listing
showed the v4 names, so validation also rejected the post-normalize
`deepseek-chat` as "not in provider listing" — the contradiction
users saw. Normalizer now respects the picker's choice.
Follow-up on top of #15096 cherry-pick:
- Remove spotify_* from _HERMES_CORE_TOOLS (keep only in the 'spotify'
toolset, so the 9 Spotify tool schemas are not shipped to every user).
- Add 'spotify' to CONFIGURABLE_TOOLSETS + _DEFAULT_OFF_TOOLSETS so new
installs get it opt-in via 'hermes tools', matching homeassistant/rl.
- Wire TOOL_CATEGORIES entry pointing at 'hermes auth spotify' for the
actual PKCE login (optional HERMES_SPOTIFY_CLIENT_ID /
HERMES_SPOTIFY_REDIRECT_URI env vars).
- scripts/release.py: map contributor email to GitHub login.
The Copilot provider resolved context windows via models.dev static data,
which does not include account-specific models (e.g. claude-opus-4.6-1m
with 1M context). This adds the live Copilot /models API as a higher-
priority source for copilot/copilot-acp/github-copilot providers.
New helper get_copilot_model_context() in hermes_cli/models.py extracts
capabilities.limits.max_prompt_tokens from the cached catalog. Results
are cached in-process for 1 hour.
In agent/model_metadata.py, step 5a queries the live API before falling
through to models.dev (step 5b). This ensures account-specific models
get correct context windows while standard models still have a fallback.
Part 1 of #7731.
Refs: #7272
Raw GitHub tokens (gho_/github_pat_/ghu_) are now exchanged for
short-lived Copilot API tokens via /copilot_internal/v2/token before
being used as Bearer credentials. This is required to access
internal-only models (e.g. claude-opus-4.6-1m with 1M context).
Implementation:
- exchange_copilot_token(): calls the token exchange endpoint with
in-process caching (dict keyed by SHA-256 fingerprint), refreshed
2 minutes before expiry. No disk persistence — gateway is long-running
so in-memory cache is sufficient.
- get_copilot_api_token(): convenience wrapper with graceful fallback —
returns exchanged token on success, raw token on failure.
- Both callers (hermes_cli/auth.py and agent/credential_pool.py) now
pipe the raw token through get_copilot_api_token() before use.
12 new tests covering exchange, caching, expiry, error handling,
fingerprinting, and caller integration. All 185 existing copilot/auth
tests pass.
Part 2 of #7731.
Two narrow fixes motivated by #15099.
1. _seed_from_singletons() was dropping obtained_at, agent_key_obtained_at,
expires_in, and friends when seeding device_code pool entries from the
providers.nous singleton. Fresh credentials showed up with
obtained_at=None, which broke downstream freshness-sensitive consumers
(self-heal hooks, pool pruning by age) — they treated just-minted
credentials as older than they actually were and evicted them.
2. When the Nous Portal OAuth 2.1 server returns invalid_grant with
'Refresh token reuse detected' in the error_description, rewrite the
message to explain the likely cause (an external process consumed the
rotated RT without persisting it back) and the mitigation. The generic
reuse message led users to report this as a Hermes persistence bug when
the actual trigger was typically a third-party monitoring script calling
/api/oauth/token directly. Non-reuse errors keep their original server
description untouched.
Closes#15099.
Regression tests:
- tests/agent/test_credential_pool.py::test_nous_seed_from_singletons_preserves_obtained_at_timestamps
- tests/hermes_cli/test_auth_nous_provider.py::test_refresh_token_reuse_detection_surfaces_actionable_message
- tests/hermes_cli/test_auth_nous_provider.py::test_refresh_non_reuse_error_keeps_original_description
Cron jobs can now specify a per-job working directory. When set, the job
runs as if launched from that directory: AGENTS.md / CLAUDE.md /
.cursorrules from that dir are injected into the system prompt, and the
terminal / file / code-exec tools use it as their cwd (via TERMINAL_CWD).
When unset, old behaviour is preserved (no project context files, tools
use the scheduler's cwd).
Requested by @bluthcy.
## Mechanism
- cron/jobs.py: create_job / update_job accept 'workdir'; validated to
be an absolute existing directory at create/update time.
- cron/scheduler.py run_job: if job.workdir is set, point TERMINAL_CWD
at it and flip skip_context_files to False before building the agent.
Restored in finally on every exit path.
- cron/scheduler.py tick: workdir jobs run sequentially (outside the
thread pool) because TERMINAL_CWD is process-global. Workdir-less jobs
still run in the parallel pool unchanged.
- tools/cronjob_tools.py + hermes_cli/cron.py + hermes_cli/main.py:
expose 'workdir' via the cronjob tool and 'hermes cron create/edit
--workdir ...'. Empty string on edit clears the field.
## Validation
- tests/cron/test_cron_workdir.py (21 tests): normalize, create, update,
JSON round-trip via cronjob tool, tick partition (workdir jobs run on
the main thread, not the pool), run_job env toggle + restore in finally.
- Full targeted suite (tests/cron/, test_cronjob_tools.py, test_cron.py,
test_config_cwd_bridge.py, test_worktree.py): 314/314 passed.
- Live smoke: hermes cron create --workdir $(pwd) works; relative path
rejected; list shows 'Workdir:'; edit --workdir '' clears.
agent/redact.py snapshots _REDACT_ENABLED from HERMES_REDACT_SECRETS at
module-import time. hermes_cli/main.py calls setup_logging() early, which
transitively imports agent.redact — BEFORE any config bridge has run. So
users who set 'security.redact_secrets: false' in config.yaml (instead of
HERMES_REDACT_SECRETS=false in .env) had the toggle silently ignored in
both 'hermes chat' and 'hermes gateway run'.
Bridge config.yaml -> env var in hermes_cli/main.py BEFORE setup_logging.
.env still wins (only set env when unset) — config.yaml is the fallback.
Regression tests in tests/hermes_cli/test_redact_config_bridge.py spawn
fresh subprocesses to verify:
- redact_secrets: false in config.yaml disables redaction
- default (key absent) leaves redaction enabled
- .env HERMES_REDACT_SECRETS=true overrides config.yaml
/model kimi-k2.6 on opencode-zen (or glm-5.1 on opencode-go) returned OpenCode's
website 404 HTML page when the user's persisted model.default was a Claude or
MiniMax model. The switched-to chat_completions request hit
https://opencode.ai/zen (or /zen/go) with no /v1 suffix.
Root cause: resolve_runtime_provider() computed api_mode from
model_cfg.get('default') instead of the model being requested. With a Claude
default, it resolved api_mode=anthropic_messages, stripped /v1 from base_url
(required for the Anthropic SDK), then switch_model()'s opencode_model_api_mode
override flipped api_mode back to chat_completions without restoring /v1.
Fix: thread an optional target_model kwarg through resolve_runtime_provider
and _resolve_runtime_from_pool_entry. When the caller is performing an explicit
mid-session model switch (i.e. switch_model()), the target model drives both
api_mode selection and the conditional /v1 strip. Other callers (CLI init,
gateway init, cron, ACP, aux client, delegate, account_usage, tui_gateway) pass
nothing and preserve the existing config-default behavior.
Regression tests added in test_model_switch_opencode_anthropic.py use the REAL
resolver (not a mock) to guard the exact Quentin-repro scenario. Existing tests
that mocked resolve_runtime_provider with 'lambda requested:' had their mock
signatures widened to '**kwargs' to accept the new kwarg.
When /model selects Custom but model.provider in YAML still reflects a prior provider, trust model.base_url only for loopback hosts or when provider is custom. Consult CUSTOM_BASE_URL before OpenRouter defaults (#14676).
OpenAI's OAuth token endpoint returns errors in a nested shape —
{"error": {"code": "refresh_token_reused", "message": "..."}} —
not the OAuth spec's flat {"error": "...", "error_description": "..."}.
The existing parser only handled the flat shape, so:
- `err.get("error")` returned a dict, the `isinstance(str)` guard
rejected it, and `code` stayed `"codex_refresh_failed"`.
- The dedicated `refresh_token_reused` branch (with its actionable
"re-run codex + hermes auth" message and `relogin_required=True`)
never fired.
- Users saw the generic "Codex token refresh failed with status 401"
when another Codex client (CLI, VS Code extension) had consumed
their single-use refresh token — giving no hint that re-auth was
required.
Parse both shapes, mapping OpenAI's nested `code`/`type` onto the
existing `code` variable so downstream branches (`refresh_token_reused`,
`invalid_grant`, etc.) fire correctly.
Add regression tests covering:
- nested `refresh_token_reused` → actionable message + relogin_required
- nested generic code → code + message surfaced
- flat OAuth-spec `invalid_grant` still handled (back-compat)
- unparseable body → generic fallback message, relogin_required=False
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Google AI Studio's free tier (<= 250 req/day for gemini-2.5-flash) is
exhausted in a handful of agent turns, so the setup wizard now refuses
to wire up Gemini when the supplied key is on the free tier, and the
runtime 429 handler appends actionable billing guidance.
Setup-time probe (hermes_cli/main.py):
- `_model_flow_api_key_provider` fires one minimal generateContent call
when provider_id == 'gemini' and classifies the response as
free/paid/unknown via x-ratelimit-limit-requests-per-day header or
429 body containing 'free_tier'.
- Free -> print block message, refuse to save the provider, return.
- Paid -> 'Tier check: paid' and proceed.
- Unknown (network/auth error) -> 'could not verify', proceed anyway.
Runtime 429 handler (agent/gemini_native_adapter.py):
- `gemini_http_error` appends billing guidance when the 429 error body
mentions 'free_tier', catching users who bypass setup by putting
GOOGLE_API_KEY directly in .env.
Tests: 21 unit tests for the probe + error path, 4 tests for the
setup-flow block. All 67 existing gemini tests still pass.
Make the main-branch test suite pass again. Most failures were tests
still asserting old shapes after recent refactors; two were real source
bugs.
Source fixes:
- tools/mcp_tool.py: _kill_orphaned_mcp_children() slept 2s on every
shutdown even when no tracked PIDs existed, making test_shutdown_is_parallel
measure ~3s for 3 parallel 1s shutdowns. Early-return when pids is empty.
- hermes_cli/tips.py: tip 105 was 157 chars; corpus max is 150.
Test fixes (mostly stale mock targets / missing fixture fields):
- test_zombie_process_cleanup, test_agent_cache: patch run_agent.cleanup_vm
(the local name bound at import), not tools.terminal_tool.cleanup_vm.
- test_browser_camofox: patch tools.browser_camofox.load_config, not
hermes_cli.config.load_config (the source module, not the resolved one).
- test_flush_memories_codex._chat_response_with_memory_call: add
finish_reason, tool_call.id, tool_call.type so the chat_completions
transport normalizer doesn't AttributeError.
- test_concurrent_interrupt: polling_tool signature now accepts
messages= kwarg that _invoke_tool() passes through.
- test_minimax_provider: add _fallback_chain=[] to the __new__'d agent
so switch_model() doesn't AttributeError.
- test_skills_config: SKILLS_DIR MagicMock + .rglob stopped working
after the scanner switched to agent.skill_utils.iter_skill_index_files
(os.walk-based). Point SKILLS_DIR at a real tmp_path and patch
agent.skill_utils.get_external_skills_dirs.
- test_browser_cdp_tool: browser_cdp toolset was intentionally split into
'browser-cdp' (commit 96b0f3700) so its stricter check_fn doesn't gate
the whole browser toolset; test now expects 'browser-cdp'.
- test_registry: add tools.browser_dialog_tool to the expected
builtin-discovery set (PR #14540 added it).
- test_file_tools TestPatchHints: patch_tool surfaces hints as a '_hint'
key on the JSON payload, not inline '[Hint: ...' text.
- test_write_deny test_hermes_env: resolve .env via get_hermes_home() so
the path matches the profile-aware denylist under hermetic HERMES_HOME.
- test_checkpoint_manager test_falls_back_to_parent: guard the walk-up
so a stray /tmp/pyproject.toml on the host doesn't pick up /tmp as the
project root.
- test_quick_commands: set cli.session_id in the __new__'d CLI so the
alias-args path doesn't trip AttributeError when fuzzy-matching leaks
a skill command across xdist test distribution.
Keep Discord Copilot model switching responsive and current by refreshing picker data from the live catalog when possible, correcting the curated fallback list, and clearing stale controls before the switch completes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Xiaomi's API (api.xiaomimimo.com) requires lowercase model IDs like
"mimo-v2.5-pro" but rejects mixed-case names like "MiMo-V2.5-Pro"
that users copy from marketing docs or the ProviderEntry description.
Add _LOWERCASE_MODEL_PROVIDERS set and apply .lower() to model names
for providers in this set (currently just xiaomi) after stripping the
provider prefix. This ensures any case variant in config.yaml is
normalized before hitting the API.
Other providers (minimax, zai, etc.) are NOT affected — their APIs
accept mixed case (e.g. MiniMax-M2.7).
- Load prompt_caching.cache_ttl in AIAgent (5m default, 1h opt-in)
- Document DEFAULT_CONFIG and developer guide example
- Add unit tests for default, 1h, and invalid TTL fallback
Made-with: Cursor
Auxiliary tasks (session_search, flush_memories, approvals, compression,
vision, etc.) that route to a named custom provider declared under
config.yaml 'providers:' with 'api_mode: anthropic_messages' were
silently building a plain OpenAI client and POSTing to
{base_url}/chat/completions, which returns 404 on Anthropic-compatible
gateways that only expose /v1/messages.
Two gaps caused this:
1. hermes_cli/runtime_provider.py::_get_named_custom_provider — the
providers-dict branch (new-style) returned only name/base_url/api_key/
model and dropped api_mode. The legacy custom_providers-list branch
already propagated it correctly. The dict branch now parses and
returns api_mode via _parse_api_mode() in both match paths.
2. agent/auxiliary_client.py::resolve_provider_client — the named
custom provider block at ~L1740 ignored custom_entry['api_mode']
and unconditionally built an OpenAI client (only wrapping for
Codex/Responses). It now mirrors _try_custom_endpoint()'s three-way
dispatch: anthropic_messages → AnthropicAuxiliaryClient (async wrapped
in AsyncAnthropicAuxiliaryClient), codex_responses → CodexAuxiliaryClient,
otherwise plain OpenAI. An explicit task-level api_mode override
still wins over the provider entry's declared api_mode.
Fixes#15033
Tests: tests/agent/test_auxiliary_named_custom_providers.py gains a
TestProvidersDictApiModeAnthropicMessages class covering
- providers-dict preserves valid api_mode
- invalid api_mode values are dropped
- missing api_mode leaves the entry unchanged (no regression)
- resolve_provider_client returns (Async)AnthropicAuxiliaryClient for
api_mode=anthropic_messages
- full chain via get_text_auxiliary_client / get_async_text_auxiliary_client
with an auxiliary.<task> override
- providers without api_mode still use the OpenAI-wire path
Follow-up to aeff6dfe:
- Fix semantic error in VALID_HOOKS inline comment ("after core auth" ->
"before auth"). Hook intentionally runs BEFORE auth so plugins can
handle unauthorized senders without triggering the pairing flow.
- Fix wrong class name in the same comment (HermesGateway ->
GatewayRunner, matching gateway/run.py).
- Add a full ### pre_gateway_dispatch section in
website/docs/user-guide/features/hooks.md (matches the pattern of
every other plugin hook: signature, params table, fires-where,
return-value table, use cases, two worked examples) plus a row in
the quick-reference table.
- Add the anchor link on the plugins.md table row so it matches the
other hook entries.
No code behavior change.
Introduces a new plugin hook `pre_gateway_dispatch` fired once per
incoming MessageEvent in `_handle_message`, after the internal-event
guard but before the auth / pairing chain. Plugins may return a dict
to influence flow:
{"action": "skip", "reason": "..."} -> drop (no reply)
{"action": "rewrite", "text": "..."} -> replace event.text
{"action": "allow"} / None -> normal dispatch
Motivation: gateway-level message-flow patterns that don't fit cleanly
into any single adapter — e.g. listen-only group-chat windows (buffer
ambient messages, collapse on @mention), or human-handover silent
ingest (record messages while an owner handles the chat manually).
Today these require forking core; with this hook they can live in a
single profile-agnostic plugin.
Hook runs BEFORE auth so plugins can handle unauthorized senders
(e.g. customer-service handover ingest) without triggering the
pairing-code flow. Exceptions in plugin callbacks are caught and
logged; the first non-None action dict wins, remaining results are
ignored.
Includes:
- `VALID_HOOKS` entry + inline doc in `hermes_cli/plugins.py`
- Invocation block in `gateway/run.py::_handle_message`
- 5 new tests in `tests/gateway/test_pre_gateway_dispatch.py`
(skip, rewrite, allow, exception safety, internal-event bypass)
- 2 additional tests in `tests/hermes_cli/test_plugins.py`
- Table entry in `website/docs/user-guide/features/plugins.md`
Made-with: Cursor
- hermes_cli/auth.py: add _default_verify() with macOS Homebrew certifi
fallback (mirrors weixin 3a0ec1d93). Extend env var chain to include
REQUESTS_CA_BUNDLE so one env var works across httpx + requests paths.
- agent/model_metadata.py: add _resolve_requests_verify() reading
HERMES_CA_BUNDLE / REQUESTS_CA_BUNDLE / SSL_CERT_FILE in priority
order. Apply explicit verify= to all 6 requests.get callsites.
- Tests: 18 new unit tests + autouse platform pin on existing
TestResolveVerifyFallback to keep its "returns True" assertions
platform-independent.
Empirically verified against self-signed HTTPS server: requests honors
REQUESTS_CA_BUNDLE only; httpx honors SSL_CERT_FILE only. Hermes now
honors all three everywhere.
Triggered by Discord reports — Nous OAuth SSL failure on macOS
Homebrew Python; custom provider self-signed cert ignored despite
REQUESTS_CA_BUNDLE set in env.
The aliases were added to hermes_cli/providers.py but auth.py has its own
_PROVIDER_ALIASES table inside resolve_provider() that is consulted before
PROVIDER_REGISTRY lookup. Without this, provider: alibaba_coding in
config.yaml (the exact repro from #14940) raised 'Unknown provider'.
Mirror the three aliases into auth.py so resolve_provider() accepts them.
The alibaba-coding-plan provider (coding-intl.dashscope.aliyuncs.com/v1)
was not registered in providers.py or auth.py. When users set
provider: alibaba_coding or provider: alibaba-coding-plan in config.yaml,
Hermes could not resolve the credentials and fell back to OpenRouter
or rejected the request with HTTP 401/402 (issue #14940).
Changes:
- providers.py: add HermesOverlay for alibaba-coding-plan with
ALIBABA_CODING_PLAN_BASE_URL env var support
- providers.py: add aliases alibaba_coding, alibaba-coding,
alibaba_coding_plan -> alibaba-coding-plan
- auth.py: add ProviderConfig for alibaba-coding-plan with:
- inference_base_url: https://coding-intl.dashscope.aliyuncs.com/v1
- api_key_env_vars: ALIBABA_CODING_PLAN_API_KEY, DASHSCOPE_API_KEY
Fixes#14940
Wrap the existing version label in the welcome-banner panel title
('Hermes Agent v… · upstream … · local …') with an OSC-8 terminal
hyperlink pointing at the latest git tag's GitHub release page
(https://github.com/NousResearch/hermes-agent/releases/tag/<tag>).
Clickable in modern terminals (iTerm2, WezTerm, Windows Terminal,
GNOME Terminal, Kitty, etc.); degrades to plain text on terminals
without OSC-8 support. No new line added to the banner.
New get_latest_release_tag() helper runs 'git describe --tags
--abbrev=0' in the Hermes checkout (3s timeout, per-process cache,
silent fallback for non-git/pip installs and forks without tags).
* docs: browser CDP supervisor design (for upcoming PR)
Design doc ahead of implementation — dialog + iframe detection/interaction
via a persistent CDP supervisor. Covers backend capability matrix (verified
live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split,
non-goals, and test plan.
Supersedes #12550.
No code changes in this commit.
* feat(browser): add persistent CDP supervisor for dialog + frame detection
Single persistent CDP WebSocket per Hermes task_id that subscribes to
Page/Runtime/Target events and maintains thread-safe state for pending
dialogs, frame tree, and console errors.
Supervisor lives in its own daemon thread running an asyncio loop;
external callers use sync API (snapshot(), respond_to_dialog()) that
bridges onto the loop.
Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true}
and enables Page+Runtime on each so iframe-origin dialogs surface through
the same supervisor.
Dialog policies: must_respond (default, 300s safety timeout),
auto_dismiss, auto_accept.
Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot
payloads bounded on ad-heavy pages.
E2E verified against real Chrome via smoke test — detects + responds
to main-frame alerts, iframe-contentWindow alerts, preserves frame
tree, graceful no-dialog error path, clean shutdown.
No agent-facing tool wiring in this commit (comes next).
* feat(browser): add browser_dialog tool wired to CDP supervisor
Agent-facing response-only tool. Schema:
action: 'accept' | 'dismiss' (required)
prompt_text: response for prompt() dialogs (optional)
dialog_id: disambiguate when multiple dialogs queued (optional)
Handler:
SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...)
check_fn shares _browser_cdp_check with browser_cdp so both surface and
hide together. When no supervisor is attached (Camofox, default
Playwright, or no browser session started yet), tool is hidden; if
somehow invoked it returns a clear error pointing the agent to
browser_navigate / /browser connect.
Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp /
hermes-api-server toolsets alongside browser_cdp.
* feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot
Supervisor lifecycle:
* _get_session_info lazy-starts the supervisor after a session row is
materialized — covers every backend code path (Browserbase, cdp_url
override, /browser connect, future providers) with one hook.
* cleanup_browser(task_id) stops the supervisor for that task first
(before the backend tears down CDP).
* cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all().
* /browser connect eagerly starts the supervisor for task 'default'
so the first snapshot already shows pending_dialogs.
* /browser disconnect stops the supervisor.
CDP URL resolution for the supervisor:
1. BROWSER_CDP_URL / browser.cdp_url override.
2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase).
browser_snapshot merges supervisor state (pending_dialogs + frame_tree)
into its JSON output when a supervisor is active — the agent reads
pending_dialogs from the snapshot it already requests, then calls
browser_dialog to respond. No extra tool surface.
Config defaults:
* browser.dialog_policy: 'must_respond' (new)
* browser.dialog_timeout_s: 300 (new)
No version bump — new keys deep-merge into existing browser section.
Deadlock fix in supervisor event dispatch:
* _on_dialog_opening and _on_target_attached used to await CDP calls
while the reader was still processing an event — but only the reader
can set the response Future, so the call timed out.
* Both now fire asyncio.create_task(...) so the reader stays pumping.
* auto_dismiss/auto_accept now actually close the dialog immediately.
Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome):
* supervisor start/snapshot
* main-frame alert detection + dismiss
* iframe.contentWindow alert
* prompt() with prompt_text reply
* respond with no pending dialog -> clean error
* auto_dismiss clears on event
* registry idempotency
* registry stop -> snapshot reports inactive
* browser_dialog tool no-supervisor error
* browser_dialog invalid action
* browser_dialog end-to-end via tool handler
xdist-safe: chrome_cdp fixture uses a per-worker port.
Skipped when google-chrome/chromium isn't installed.
* docs(browser): document browser_dialog tool + CDP supervisor
- user-guide/features/browser.md: new browser_dialog section with
workflow, availability gate, and dialog_policy table
- reference/tools-reference.md: row for browser_dialog, tool count
bumped 53 -> 54, browser tools count 11 -> 12
- reference/toolsets-reference.md: browser_dialog added to browser
toolset row with note on pending_dialogs / frame_tree snapshot fields
Full design doc lives at
developer-guide/browser-supervisor.md (committed earlier).
* fix(browser): reconnect loop + recent_dialogs for Browserbase visibility
Found via Browserbase E2E test that revealed two production-critical issues:
1. **Supervisor WebSocket drops when other clients disconnect.** Browserbase's
CDP proxy tears down our long-lived WebSocket whenever a short-lived
client (e.g. agent-browser CLI's per-command CDP connection) disconnects.
Fixed with a reconnecting _run loop that re-attaches with exponential
backoff on drops. _page_session_id and _child_sessions are reset on each
reconnect; pending_dialogs and frames are preserved across reconnects.
2. **Browserbase auto-dismisses dialogs server-side within ~10ms.** Their
Playwright-based CDP proxy dismisses alert/confirm/prompt before our
Page.handleJavaScriptDialog call can respond. So pending_dialogs is
empty by the time the agent reads a snapshot on Browserbase.
Added a recent_dialogs ring buffer (capacity 20) that retains a
DialogRecord for every dialog that opened, with a closed_by tag:
* 'agent' — agent called browser_dialog
* 'auto_policy' — local auto_dismiss/auto_accept fired
* 'watchdog' — must_respond timeout auto-dismissed (300s default)
* 'remote' — browser/backend closed it on us (Browserbase)
Agents on Browserbase now see the dialog history with closed_by='remote'
so they at least know a dialog fired, even though they couldn't respond.
3. **Page.javascriptDialogClosed matching bug.** The event doesn't include a
'message' field (CDP spec has only 'result' and 'userInput') but our
_on_dialog_closed was matching on message. Fixed to match by session_id
+ oldest-first, with a safety assumption that only one dialog is in
flight per session (the JS thread is blocked while a dialog is up).
Docs + tests updated:
* browser.md: new availability matrix showing the three backends and
which mode (pending / recent / response) each supports
* developer-guide/browser-supervisor.md: three-field snapshot schema
with closed_by semantics
* test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12
passing against real Chrome)
E2E verified both backends:
* Local Chrome via /browser connect: detect + respond full workflow
(smoke_supervisor.py all 7 scenarios pass)
* Browserbase: detect via recent_dialogs with closed_by='remote'
(smoke_supervisor_browserbase_v2.py passes)
Camofox remains out of scope (REST-only, no CDP) — tracked for
upstream PR 3.
* feat(browser): XHR bridge for dialog response on Browserbase (FIXED)
Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so
Page.handleJavaScriptDialog calls lose the race. Solution: bypass native
dialogs entirely.
The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a
JavaScript override for window.alert/confirm/prompt. Those overrides
perform a synchronous XMLHttpRequest to a magic host
('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable
with a requestStage=Request pattern.
Flow when a page calls alert('hi'):
1. window.alert override intercepts, builds XHR GET to
http://hermes-dialog-bridge.invalid/?kind=alert&message=hi
2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics)
3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces
it as a pending dialog with bridge_request_id set
4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog
5. Supervisor calls Fetch.fulfillRequest with JSON body:
{accept: true|false, prompt_text: '...', dialog_id: 'd-N'}
6. The injected script parses the body, returns the appropriate value
from the override (undefined for alert, bool for confirm, string|null
for prompt)
This works identically on Browserbase AND local Chrome — no native dialog
ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog
policies (must_respond / auto_dismiss / auto_accept) all still work.
Bridge is installed on every attached session (main page + OOPIF child
sessions) so iframe dialogs are captured too.
Native-dialog path kept as a fallback for backends that don't auto-dismiss
(so a page that somehow bypasses our override — e.g. iframes that load
after Fetch.enable but before the init-script runs — still gets observed
via Page.javascriptDialogOpening).
E2E VERIFIED:
* Local Chrome: 13/13 pytest tests green (12 original + new
test_bridge_captures_prompt_and_returns_reply_text that asserts
window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds)
* Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS:
- alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓
- prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY'
→ page.prompt_ret === 'AGENT-REPLY' ✓
- confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓
- confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓
Docs updated in browser.md and developer-guide/browser-supervisor.md —
availability matrix now shows Browserbase at full parity with local
Chrome for both detection and response.
* feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...)
Adds iframe interaction to the CDP supervisor PR (was queued as PR 2).
Design: browser_cdp gets an optional frame_id parameter. When set, the
tool looks up the frame in the supervisor's frame_tree, grabs its child
cdp_session_id (OOPIF session), and dispatches the CDP call through the
supervisor's already-connected WebSocket via run_coroutine_threadsafe.
Why not stateless: on Browserbase, each fresh browser_cdp WebSocket
must re-negotiate against a signed connectUrl. The session info carries
a specific URL that can expire while the supervisor's long-lived
connection stays valid. Routing via the supervisor sidesteps this.
Agent workflow:
1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true
2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>,
params={'expression': 'document.title', 'returnByValue': True})
3. Supervisor dispatches the call on the OOPIF's child session
Supervisor state fixes needed along the way:
* _on_frame_detached now skips reason='swap' (frame migrating processes)
* _on_frame_detached also skips when the frame is an OOPIF with a live
child session — Browserbase fires spurious remove events when a
same-origin iframe gets promoted to OOPIF
* _on_target_detached clears cdp_session_id but KEEPS the frame record
so the agent still sees the OOPIF in frame_tree during transient
session flaps
E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py):
browser_cdp(method='Runtime.evaluate',
params={'expression': 'document.title', 'returnByValue': True},
frame_id=<OOPIF>)
→ {'success': True, 'result': {'value': 'Example Domain'}}
The iframe is <iframe src='https://example.com/'> inside a top-level
data: URL page on a real Browserbase session. The agent Runtime.evaluates
INSIDE the cross-origin iframe and gets example.com's title back.
Tests (tests/tools/test_browser_supervisor.py — 16 pass total):
* test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF,
verifies routing via supervisor, Runtime.evaluate returns 1+1=2
* test_browser_cdp_frame_id_missing_supervisor — clean error when no
supervisor attached
* test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad
frame_id
Docs (browser.md and developer-guide/browser-supervisor.md) updated with
the iframe workflow, availability matrix now shows OOPIF eval as shipped
for local Chrome + Browserbase.
* test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process
When asked 'did you test the iframe stuff' I had only done a mocked
pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the
local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/
smoke_local_oopif.py:
* 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906)
* Chrome with --site-per-process so the cross-origin iframe becomes a
real OOPIF in its own process
* Navigate, find OOPIF in supervisor.frame_tree, call
browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes
through the supervisor's child session
* Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the
inner page, retrieved via OOPIF eval)
PASSED on 2026-04-23.
Tried to embed this as a pytest but hit an asyncio version quirk between
venv (3.11) and the system python (3.13) — Page.navigate hangs in the
pytest harness but works in standalone. Left a self-documenting skip
test that points to the smoke script + describes the verification.
chrome_cdp fixture now passes --site-per-process so future iframe tests
can rely on OOPIF behavior.
Result: 16 pass + 1 documented-skip = 17 tests in
tests/tools/test_browser_supervisor.py.
* docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count
Pre-merge docs audit revealed two gaps:
1. user-guide/configuration.md browser config example was missing the
two new dialog_* knobs. Added with a short table explaining
must_respond / auto_dismiss / auto_accept semantics and a link to
the feature page for the full workflow.
2. reference/tools-reference.md header said '54 built-in tools' — real
count on main is 54, this branch adds browser_dialog so it's 55.
Fixed the header. (browser count was already correctly bumped
11 -> 12 in the earlier docs commit.)
No code changes.
* feat(config): make tool output truncation limits configurable
Port from anomalyco/opencode#23770: expose a new `tool_output` config
section so users can tune the hardcoded truncation caps that apply to
terminal output and read_file pagination.
Three knobs under `tool_output`:
- max_bytes (default 50_000) — terminal stdout/stderr cap
- max_lines (default 2000) — read_file pagination cap
- max_line_length (default 2000) — per-line cap in line-numbered view
All three keep their existing hardcoded values as defaults, so behaviour
is unchanged when the section is absent. Power users on big-context
models can raise them; small-context local models can lower them.
Implementation:
- New `tools/tool_output_limits.py` reads the section with defensive
fallback (missing/invalid values → defaults, never raises).
- `tools/terminal_tool.py` MAX_OUTPUT_CHARS now comes from
get_max_bytes().
- `tools/file_operations.py` normalize_read_pagination() and
_add_line_numbers() now pull the limits at call time.
- `hermes_cli/config.py` DEFAULT_CONFIG gains the `tool_output` section
so `hermes setup` writes defaults into fresh configs.
- Docs page `user-guide/configuration.md` gains a "Tool Output
Truncation Limits" section with large-context and small-context
example configs.
Tests (18 new in tests/tools/test_tool_output_limits.py):
- Default resolution with missing / malformed / non-dict config.
- Full and partial user overrides.
- Coercion of bad values (None, negative, wrong type, str int).
- Shortcut accessors delegate correctly.
- DEFAULT_CONFIG exposes the section with the right defaults.
- Integration: normalize_read_pagination clamps to the configured
max_lines.
* feat(skills): add design-md skill for Google's DESIGN.md spec
Built-in skill under skills/creative/ that teaches the agent to author,
lint, diff, and export DESIGN.md files — Google's open-source
(Apache-2.0) format for describing a visual identity to coding agents.
Covers:
- YAML front matter + markdown body anatomy
- Full token schema (colors, typography, rounded, spacing, components)
- Canonical section order + duplicate-heading rejection
- Component property whitelist + variants-as-siblings pattern
- CLI workflow via 'npx @google/design.md' (lint/diff/export/spec)
- Lint rule reference including WCAG contrast checks
- Common YAML pitfalls (quoted hex, negative dimensions, dotted refs)
- Starter template at templates/starter.md
Package verified live on npm (@google/design.md@0.1.1).
Crash-log stack trace (tui_gateway_crash.log) from the user's session
pinned the regression: SIGPIPE arrived while main thread was blocked on
for-raw-in-sys.stdin — i.e., a background thread (debug print to stderr,
most likely from HERMES_VOICE_DEBUG=1) wrote to a pipe whose buffer the
TUI hadn't drained yet, and SIG_DFL promptly killed the process.
Two fixes that together restore CLI parity:
- entry.py: SIGPIPE → SIG_IGN instead of the _log_signal handler that
then exited. With SIG_IGN, Python raises BrokenPipeError on the
offending write, which write_json already handles with a clean exit
via _log_exit. SIGTERM / SIGHUP still route through _log_signal so
real termination signals remain diagnosable.
- hermes_cli/voice.py:_debug: wrap the stderr print in a BrokenPipeError
/ OSError try/except. This runs from daemon threads (silence callback,
TTS playback, beep), so a broken stderr must not escape and ride up
into the main event loop.
Verified by spawning the gateway subprocess locally:
voice.toggle status → 200 OK, process stays alive, clean exit on
stdin close logs "reason=stdin EOF" instead of a silent reap.
TTS feedback loop (hermes_cli/voice.py)
The VAD loop kept the microphone live while speak_text played the
agent's reply over the speakers, so the reply itself was picked up,
transcribed, and submitted — the agent then replied to its own echo
("Ha, looks like we're in a loop").
Ported cli.py:_voice_tts_done synchronisation:
- _tts_playing: threading.Event (initially set = "not playing").
- speak_text cancels the active recorder before opening the speakers,
clears _tts_playing, and on exit waits 300 ms before re-starting the
recorder — long enough for the OS audio device to settle so afplay
and sounddevice don't race for it.
- _continuous_on_silence now waits on _tts_playing (up to 60 s) before
re-arming the mic with another 300 ms gap, mirroring
cli.py:10619-10621. If the user flips voice off during the wait the
loop exits cleanly instead of fighting for the device.
Without both halves the loop races: if the silence callback fires
before TTS starts it re-arms immediately; if TTS is already playing
the pause-and-resume path catches it.
Red REC badge (ui-tui appChrome + useMainApp)
Classic CLI (cli.py:_get_voice_status_fragments) renders "● REC" in
red and "◉ STT" in amber. TUI was showing a dim "REC" with no dot,
making it hard to spot at a glance. voiceLabel now emits the same
glyphs and appChrome colours them via t.color.error / t.color.warn,
falling back to dim for the idle label.
Three issues surfaced during end-to-end testing of the CLI-parity voice
loop and are fixed together because they all blocked "speak → agent
responds → TTS reads it back" from working at all:
1. Wrong result key (hermes_cli/voice.py)
transcribe_recording() returns {"success": bool, "transcript": str},
matching cli.py:_voice_stop_and_transcribe. The wrapper was reading
result.get("text"), which is None, so every successful Groq / local
STT response was thrown away and the 3-strikes halt fired after
three silent-looking cycles. Fixed by reading "transcript" and also
honouring "success" like the CLI does. Updated the loop simulation
tests to return the correct shape.
2. TTS speak-back was missing (tui_gateway/server.py + hermes_cli/voice.py)
The TUI had a voice.toggle "tts" subcommand but nothing downstream
actually read the flag — agent replies never spoke. Mirrored
cli.py:8747-8754's dispatch: on message.complete with status ==
"complete", if _voice_tts_enabled() is true, spawn a daemon thread
running speak_text(response). Rewrote speak_text as a full port of
cli.py:_voice_speak_response — same markdown-strip regex pipeline
(code blocks, links, bold/italic, inline code, headers, list bullets,
horizontal rules, excessive newlines), same 4000-char cap, same
explicit mp3 output path, same MP3-over-OGG playback choice (afplay
misbehaves on OGG), same cleanup of both extensions. Keeps TUI TTS
audible output byte-for-byte identical to the classic CLI.
3. Auto-submit swallowed on non-empty composer (createGatewayEventHandler.ts)
The voice.transcript handler branched on prev input via a setInput
updater and fired submitRef.current inside the updater when prev was
empty. React strict mode double-invokes state updaters, which would
queue the submit twice; and when the composer had any content the
transcript was merely appended — the agent never saw it. CLI
_pending_input.put(transcript) unconditionally feeds the transcript
as the next turn, so match that: always clear the composer and
setTimeout(() => submitRef.current(text), 0) outside any updater.
Side effect can't run twice this way, and a half-typed draft on the
rare occasion is a fair trade vs. silently dropping the turn.
Also added peak_rms to the rec.stop debug line so "recording too quiet"
is diagnosable at a glance when HERMES_VOICE_DEBUG=1.
The TUI had drifted from the CLI's voice model in two ways:
- /voice on was lighting up the microphone immediately and Ctrl+B was
interpreted as a mode toggle. The CLI separates the two: /voice on
just flips the umbrella bit, recording only starts once the user
presses Ctrl+B, which also sets _voice_continuous so the VAD loop
auto-restarts until the user presses Ctrl+B again or three silent
cycles pass.
- /voice tts was missing entirely, so users couldn't turn agent reply
speech on/off from inside the TUI.
This commit brings the TUI to parity.
Python
- hermes_cli/voice.py: continuous-mode API (start_continuous,
stop_continuous, is_continuous_active) layered on the existing PTT
wrappers. The silence callback transcribes, fires on_transcript,
tracks consecutive no-speech cycles, and auto-restarts — mirroring
cli.py:_voice_stop_and_transcribe + _restart_recording.
- tui_gateway/server.py:
- voice.toggle now supports on / off / tts / status. The umbrella
bit lives in HERMES_VOICE + display.voice_enabled; tts lives in
HERMES_VOICE_TTS + display.voice_tts. /voice off also tears down
any active continuous loop so a toggle-off really releases the
microphone.
- voice.record start/stop now drives start_continuous/stop_continuous.
start is refused with a clear error when the mode is off, matching
cli.py:handle_voice_record's early return on `not _voice_mode`.
- New voice.transcript / voice.status events emit through
_voice_emit (remembers the sid that last enabled the mode so
events land in the right session).
TypeScript
- gatewayTypes.ts: voice.status + voice.transcript event
discriminants; VoiceToggleResponse gains tts; VoiceRecordResponse
gains status for the new "started/stopped" responses.
- interfaces.ts: GatewayEventHandlerContext gains composer.setInput +
submission.submitRef + voice.{setRecording, setProcessing,
setVoiceEnabled}; InputHandlerContext.voice gains enabled +
setVoiceEnabled for the mode-aware Ctrl+B handler.
- createGatewayEventHandler.ts: voice.status drives REC/STT badges;
voice.transcript auto-submits when the composer is empty (CLI
_pending_input.put parity) and appends when a draft is in flight.
no_speech_limit flips voice off + sys line.
- useInputHandlers.ts: Ctrl+B now calls voice.record (start/stop),
not voice.toggle, and nudges the user with a sys line when the
mode is off instead of silently flipping it on.
- useMainApp.ts: wires the new event-handler context fields.
- slash/commands/session.ts: /voice handles on / off / tts / status
with CLI-matching output ("voice: mode on · tts off").
Backward compat preserved for voice.record (was always PTT shape;
gateway still honours start/stop with mode-gating added).
tui_gateway/server.py:3486/3491/3509 imports start_recording,
stop_and_transcribe, and speak_text from hermes_cli.voice, but the
module never existed (not in git history — never shipped, never
deleted). Every voice.record / voice.tts RPC call hit the ImportError
branch and the TUI surfaced it as "voice module not available — install
audio dependencies" even on boxes with sounddevice / faster-whisper /
numpy installed.
Adds a thin wrapper on top of tools.voice_mode (recording +
transcription) and tools.tts_tool (text-to-speech):
- start_recording() — idempotent; stores the active AudioRecorder in a
module-global guarded by a Lock so repeat Ctrl+B presses don't fight
over the mic.
- stop_and_transcribe() — returns None for no-op / no-speech /
Whisper-hallucination cases so the TUI's existing "no speech detected"
path keeps working unchanged.
- speak_text(text) — lazily imports tts_tool (optional provider SDKs
stay unloaded until the first /voice tts call), parses the tool's
JSON result, and plays the audio via play_audio_file.
Paired with the Ctrl+B keybinding fix in the prior commit, the TUI
voice pipeline now works end-to-end for the first time.
The 300s default was too tight for high-reasoning models on non-trivial
delegated tasks — e.g. gpt-5.5 xhigh reviewing 12 files would burn >5min
on reasoning tokens before issuing its first tool call, tripping the
hard wall-clock timeout with 0 api_calls logged.
- tools/delegate_tool.py: DEFAULT_CHILD_TIMEOUT 300 -> 600
- hermes_cli/config.py: surface delegation.child_timeout_seconds in
DEFAULT_CONFIG so it's discoverable (previously the key was read by
_get_child_timeout() but absent from the default config schema)
Users can still override via config.yaml delegation.child_timeout_seconds
or DELEGATION_CHILD_TIMEOUT_SECONDS env var (floor 30s, no ceiling).
Cron now resolves its toolset from the same per-platform config the
gateway uses — `_get_platform_tools(cfg, 'cron')` — instead of blindly
loading every default toolset. Existing cron jobs without a per-job
override automatically lose `moa`, `homeassistant`, and `rl` (the
`_DEFAULT_OFF_TOOLSETS` set), which stops the "surprise $4.63
mixture_of_agents run" class of bug (Norbert, Discord).
Precedence inside `run_job`:
1. per-job `enabled_toolsets` (PR #14767 / #6130) — wins if set
2. `_get_platform_tools(cfg, 'cron')` — new, the blanket gate
3. `None` fallback (legacy) — only on resolver exception
Changes:
- hermes_cli/platforms.py: register 'cron' with default_toolset
'hermes-cron'
- toolsets.py: add 'hermes-cron' toolset (mirrors 'hermes-cli';
`_get_platform_tools` then filters via `_DEFAULT_OFF_TOOLSETS`)
- cron/scheduler.py: add `_resolve_cron_enabled_toolsets(job, cfg)`,
call it at the `AIAgent(...)` kwargs site
- tests/cron/test_scheduler.py: replace the 'None when not set' test
(outdated contract) with an invariant ('moa not in default cron
toolset') + new per-job-wins precedence test
- tests/hermes_cli/test_tools_config.py: mark 'cron' as non-messaging
in the gateway-toolset-coverage test
Themes and plugins can now pull off arbitrary dashboard reskins (cockpit
HUD, retro terminal, etc.) without touching core code.
Themes gain four new fields:
- layoutVariant: standard | cockpit | tiled — shell layout selector
- assets: {bg, hero, logo, crest, sidebar, header, custom: {...}} —
artwork URLs exposed as --theme-asset-* CSS vars
- customCSS: raw CSS injected as a scoped <style> tag on theme apply
(32 KiB cap, cleaned up on theme switch)
- componentStyles: per-component CSS-var overrides (clipPath,
borderImage, background, boxShadow, ...) for card/header/sidebar/
backdrop/tab/progress/badge/footer/page
Plugin manifests gain three new fields:
- tab.override: replaces a built-in route instead of adding a tab
- tab.hidden: register component + slots without adding a nav entry
- slots: declares shell slots the plugin populates
10 named shell slots: backdrop, header-left/right/banner, sidebar,
pre-main, post-main, footer-left/right, overlay. Plugins register via
window.__HERMES_PLUGINS__.registerSlot(name, slot, Component). A
<PluginSlot> React helper is exported on the plugin SDK.
Ships a full demo at plugins/strike-freedom-cockpit/ — theme YAML +
slot-only plugin that reproduces a Gundam cockpit dashboard: MS-STATUS
sidebar with live telemetry, COMPASS crest in header, notched card
corners via componentStyles, scanline overlay via customCSS, gold/cyan
palette, Orbitron typography.
Validation:
- 15 new tests in test_web_server.py covering every extended field
- tests/hermes_cli/: 2615 passed (3 pre-existing unrelated failures)
- tsc -b --noEmit: clean
- vite build: 418 kB bundle, ~2 kB delta for slots/theme extensions
Co-authored-by: Teknium <p@nousresearch.com>
float(os.getenv(...)) at module level raises ValueError on any
non-numeric value, crashing the web server at import before it starts.
Wrap in try/except with a warning log and fallback to 3.0s.
cmd_update no longer SIGKILLs in-flight agent runs, and users get
'still working' status every 3 min instead of 10. Two long-standing
sources of '@user — agent gives up mid-task' reports on Telegram and
other gateways.
Drain-aware update:
- New helper hermes_cli.gateway._graceful_restart_via_sigusr1(pid,
drain_timeout) sends SIGUSR1 to the gateway and polls os.kill(pid,
0) until the process exits or the budget expires.
- cmd_update's systemd loop now reads MainPID via 'systemctl show
--property=MainPID --value' and tries the graceful path first. The
gateway's existing SIGUSR1 handler -> request_restart(via_service=
True) -> drain -> exit(75) is wired in gateway/run.py and is
respawned by systemd's Restart=on-failure (and the explicit
RestartForceExitStatus=75 on newer units).
- Falls back to 'systemctl restart' when MainPID is unknown, the
drain budget elapses, or the unit doesn't respawn after exit (older
units missing Restart=on-failure). Old install behavior preserved.
- Drain budget = max(restart_drain_timeout, 30s) + 15s margin so the
drain loop in run_agent + final exit have room before fallback
fires. Composes with #14728's tool-subprocess reaping.
Notification interval:
- agent.gateway_notify_interval default 600 -> 180.
- HERMES_AGENT_NOTIFY_INTERVAL env-var fallback in gateway/run.py
matched.
- 9-minute weak-model spinning runs now ping at 3 min and 6 min
instead of 27 seconds before completion, removing the 'is the bot
dead?' reflex that drives gateway-restart cycles.
Tests:
- Two new tests in tests/hermes_cli/test_update_gateway_restart.py:
one asserts SIGUSR1 is sent and 'systemctl restart' is NOT called
when MainPID is known and the helper succeeds; one asserts the
fallback fires when the helper returns False.
- E2E: spawned detached bash processes confirm the helper returns
True on SIGUSR1-handling exit (~0.5s) and False on SIGUSR1-ignoring
processes (timeout). Verified non-existent PID and pid=0 edge cases.
- 41/41 in test_update_gateway_restart.py (was 39, +2 new).
- 154/154 in shutdown-related suites including #14728's new tests.
Reported by @GeoffWellman and @ANT_1515 on X.
Closes#11616.
The agent's API retry loop hardcoded max_retries = 3, so users with
fallback providers on flaky primaries burned through ~3 × provider
timeout (e.g. 3 × 180s = 9 minutes) before their fallback chain got a
chance to kick in.
Expose a new config key:
agent:
api_max_retries: 3 # default unchanged
Set it to 1 for fast failover when you have fallback providers, or
raise it if you prefer longer tolerance on a single provider. Values
< 1 are clamped to 1 (single attempt, no retry); non-integer values
fall back to the default.
This wraps the Hermes-level retry loop only — the OpenAI SDK's own
low-level retries (max_retries=2 default) still run beneath this for
transient network errors.
Changes:
- hermes_cli/config.py: add agent.api_max_retries default 3 with comment.
- run_agent.py: read self._api_max_retries in AIAgent.__init__; replace
hardcoded max_retries = 3 in the retry loop with self._api_max_retries.
- cli-config.yaml.example: documented example entry.
- hermes_cli/tips.py: discoverable tip line.
- tests/run_agent/test_api_max_retries_config.py: 4 tests covering
default, override, clamp-to-one, and invalid-value fallback.
Closes#8202.
Root cause: stop() reclaimed tool-call bash/sleep children only at the
very end of the shutdown sequence — after a 60s drain, 5s interrupt
grace, and per-adapter disconnect. Under systemd (TimeoutStopSec bounded
by drain_timeout), that meant the cgroup SIGKILL escalation fired first,
and systemd reaped the bash/sleep children instead of us.
Fix:
- Extract tool-subprocess cleanup into a local helper
_kill_tool_subprocesses() in _stop_impl().
- Invoke it eagerly right after _interrupt_running_agents() on the
drain-timeout path, before adapter disconnect.
- Keep the existing catch-all call at the end for the graceful path
and defense in depth against mid-teardown respawns.
- Bump generated systemd unit TimeoutStopSec to drain_timeout + 30s
so cleanup + disconnect + DB close has headroom above the drain
budget, matching the 'subprocess timeout > TimeoutStopSec + margin'
rule from the skill.
Tests:
- New: test_gateway_stop_kills_tool_subprocesses_before_adapter_disconnect_on_timeout
asserts kill_all() runs before disconnect() when drain times out.
- New: test_gateway_stop_kills_tool_subprocesses_on_graceful_path
guards that the final catch-all still fires when drain succeeds
(regression guard against accidental removal during refactor).
- Updated: existing systemd unit generator tests expect TimeoutStopSec=90
(= 60s drain + 30s headroom) with explanatory comment.
A test in tests/agent/test_credential_pool.py
(test_try_refresh_current_updates_only_current_entry) monkeypatched
refresh_codex_oauth_pure() to return the literal fixture strings
'access-new'/'refresh-new', then executed the real production code path
in agent/credential_pool.py::try_refresh_current which calls
_sync_device_code_entry_to_auth_store → _save_provider_state → writes
to `providers.openai-codex.tokens`. That writer resolves the target via
get_hermes_home()/auth.json. If the test ran with HERMES_HOME unset (direct
pytest invocation, IDE runner bypassing conftest discovery, or any other
sandbox escape), it would overwrite the real user's auth store with the
fixture strings.
Observed in the wild: Teknium's ~/.hermes/auth.json providers.openai-codex.tokens
held 'access-new'/'refresh-new' for five days. His CLI kept working because
the credential_pool entries still held real JWTs, but `hermes model`'s live
discovery path (which reads via resolve_codex_runtime_credentials →
_read_codex_tokens → providers.tokens) was silently 401-ing.
Fixes:
- Delete test_try_refresh_current_updates_only_current_entry. It was the
only test that exercised a writer hitting providers.openai-codex.tokens
with literal stub tokens. The entry-level rotation behavior it asserted
is still covered by test_mark_exhausted_and_rotate_persists_status above.
- Add a seat belt in hermes_cli.auth._auth_file_path(): if PYTEST_CURRENT_TEST
is set AND the resolved path equals the real ~/.hermes/auth.json, raise
with a clear message. In production (no PYTEST_CURRENT_TEST), a single
dict lookup. Any future test that forgets to monkeypatch HERMES_HOME
fails loudly instead of corrupting the user's credentials.
Validation:
- production (no PYTEST_CURRENT_TEST): returns real path, unchanged behavior
- pytest + HERMES_HOME unset (points at real home): raises with message
- pytest + HERMES_HOME=/tmp/...: returns tmp path, tests pass normally
Dashboard themes now control typography and layout, not just colors.
Each built-in theme picks its own fonts, base size, radius, and density
so switching produces visible changes beyond hue.
Schema additions (per theme):
- typography — fontSans, fontMono, fontDisplay, fontUrl, baseSize,
lineHeight, letterSpacing. fontUrl is injected as <link> on switch
so Google/Bunny/self-hosted stylesheets all work.
- layout — radius (any CSS length) and density
(compact | comfortable | spacious, multiplies Tailwind spacing).
- colorOverrides (optional) — pin individual shadcn tokens that would
otherwise derive from the palette.
Built-in themes are now distinct beyond palette:
- default — system stack, 15px, 0.5rem radius, comfortable
- midnight — Inter + JetBrains Mono, 14px, 0.75rem, comfortable
- ember — Spectral (serif) + IBM Plex Mono, 15px, 0.25rem
- mono — IBM Plex Sans + Mono, 13px, 0 radius, compact
- cyberpunk— Share Tech Mono everywhere, 14px, 0 radius, compact
- rose — Fraunces (serif) + DM Mono, 16px, 1rem, spacious
Also fixes two bugs:
1. Custom user themes silently fell back to default. ThemeProvider
only applied BUILTIN_THEMES[name], so YAML files in
~/.hermes/dashboard-themes/ showed in the picker but did nothing.
Server now ships the full normalised definition; client applies it.
2. Docs documented a 21-token flat colors schema that never matched
the code (applyPalette reads a 3-layer palette). Rewrote the
Themes section against the actual shape.
Implementation:
- web/src/themes/types.ts: extend DashboardTheme with typography,
layout, colorOverrides; ThemeListEntry carries optional definition.
- web/src/themes/presets.ts: 6 built-ins with distinct typography+layout.
- web/src/themes/context.tsx: applyTheme() writes palette+typography+
layout+overrides as CSS vars, injects fontUrl stylesheet, fixes the
fallback-to-default bug via resolveTheme(name).
- web/src/index.css: html/body/code read the new theme-font vars;
--radius-sm/md/lg/xl derive from --theme-radius; --spacing scales
with --theme-spacing-mul so Tailwind utilities shift with density.
- hermes_cli/web_server.py: _normalise_theme_definition() parses loose
YAML (bare hex strings, partial blocks) into the canonical wire
shape; /api/dashboard/themes ships full definitions for user themes.
- tests/hermes_cli/test_web_server.py: 16 new tests covering the
normaliser and discovery (rejection cases, clamping, defaults).
- website/docs/user-guide/features/web-dashboard.md: rewrite Themes
section with real schema, per-model tables, full YAML example.
OpenAI launched GPT-5.5 on Codex today (Apr 23 2026). Adds it to the static
catalog and pipes the user's OAuth access token into the openai-codex path of
provider_model_ids() so /model mid-session and the gateway picker hit the
live ChatGPT codex/models endpoint — new models appear for each user
according to what ChatGPT actually lists for their account, without a Hermes
release.
Verified live: 'gpt-5.5' returns priority 0 (featured) from the endpoint,
400k context per OpenAI's launch article. 'hermes chat --provider
openai-codex --model gpt-5.5' completes end-to-end.
Changes:
- hermes_cli/codex_models.py: add gpt-5.5 to DEFAULT_CODEX_MODELS + forward-compat
- agent/model_metadata.py: 400k context length entry
- hermes_cli/models.py: resolve codex OAuth token before calling
get_codex_model_ids() in provider_model_ids('openai-codex')
Three bugs fixed in model alias resolution:
1. resolve_alias() returned the FIRST catalog match with no version
preference. '/model mimo' picked mimo-v2-omni (index 0 in dict)
instead of mimo-v2.5-pro. Now collects all prefix matches, sorts
by version descending with pro/max ranked above bare names, and
returns the highest.
2. models.dev registry missing newly added models (e.g. v2.5 for
native xiaomi). resolve_alias() now merges static _PROVIDER_MODELS
entries into the catalog so models resolve immediately without
waiting for models.dev to sync.
3. hermes model picker showed only models.dev results (3 xiaomi models),
hiding curated entries (5 total). The picker now merges curated
models into the models.dev list so all models appear.
Also fixes a trailing-dot float parsing edge case in _model_sort_key
where '5.4.' failed float() and multi-dot versions like '5.4.1'
weren't parsed correctly.
## Merged
Adds MiMo v2.5-pro and v2.5 support to Xiaomi native provider, OpenCode Go, and setup wizard.
### Changes
- Context lengths: added v2.5-pro (1M) and v2.5 (1M), corrected existing MiMo entries to exact values (262144)
- Provider lists: xiaomi, opencode-go, setup wizard
- Vision: upgraded from mimo-v2-omni to mimo-v2.5 (omnimodal)
- Config description updated for XIAOMI_API_KEY
- Tests updated for new vision model preference
### Verification
- 4322 tests passed, 0 new regressions
- Live API tested on Xiaomi portal: basic, reasoning, tool calling, multi-tool, file ops, system prompt, vision — all pass
- Self-review found and fixed 2 issues (redundant vision check, stale HuggingFace context length)
Replaces the blanket 'always allow' change from the previous commit with
an opt-in config flag so users who want belt-and-suspenders security can
still get the keyword scan on skill_manage output.
## Default behavior (flag off)
skill_manage(action='create'|'edit'|'patch') no longer runs the keyword
scanner. The agent can write skills that mention risky keywords in prose
(documenting what reviewers should watch for, describing cache-bust
semantics in a PR-review skill, referencing AGENTS.md, etc.) without
getting blocked.
Rationale: the agent can already execute the same code paths via
terminal() with no gate, so the scan adds friction without meaningful
security against a compromised or malicious agent.
## Opt-in behavior (flag on)
Set skills.guard_agent_created: true in config.yaml to get the original
behavior back. Scanner runs on every skill_manage write; dangerous
verdicts surface as a tool error the agent can react to (retry without
the flagged content).
## External hub installs unaffected
trusted/community sources (hermes skills install) always get scanned
regardless of this flag. The gate is specifically for skill_manage,
which only agents call.
## Changes
- hermes_cli/config.py: add skills.guard_agent_created: False to DEFAULT_CONFIG
- tools/skill_manager_tool.py: _guard_agent_created_enabled() reads the flag;
_security_scan_skill() short-circuits to None when the flag is off
- tools/skills_guard.py: restore INSTALL_POLICY['agent-created'] =
('allow', 'allow', 'ask') so the scan remains strict when it does run
- tests/tools/test_skills_guard.py: restore original ask/force tests
- tests/tools/test_skill_manager_tool.py: new TestSecurityScanGate class
covering both flag states + config error handling
## Validation
- tests/tools/test_skills_guard.py + test_skill_manager_tool.py: 115/115 pass
- E2E: flagged-keyword skill creates with default config, blocks with flag on
The environment-snapshot login shell was auto-sourcing only ~/.bashrc when
building the PATH snapshot. On Debian/Ubuntu the default ~/.bashrc starts
with a non-interactive short-circuit:
case $- in *i*) ;; *) return;; esac
Sourcing it from a non-interactive shell returns before any PATH export
below that guard runs. Node version managers like n and nvm append their
PATH line under that guard, so Hermes was capturing a PATH without
~/n/bin — and the terminal tool saw 'node: command not found' even when
node was on the user's interactive shell PATH.
Expand the auto-source list (when auto_source_bashrc is on) to:
~/.profile → ~/.bash_profile → ~/.bashrc
~/.profile and ~/.bash_profile have no interactivity guard — installers
that write their PATH there (n's n-install, nvm's curl installer on most
setups) take effect. ~/.bashrc still runs last to preserve behaviour for
users who put PATH logic there without the guard.
Added two tests covering the new behaviour plus an E2E test that spins up
a real LocalEnvironment with a guard-prefixed ~/.bashrc and a ~/.profile
PATH export, and verifies the captured snapshot PATH contains the profile
entry.
On fresh RHEL/Debian SSH sessions without linger, `systemctl --user
start hermes-gateway` fails with 'Failed to connect to bus: No medium
found' because /run/user/$UID/bus doesn't exist. Setup previously
showed a raw CalledProcessError and continued claiming success, so the
gateway never actually started.
systemd_start() and systemd_restart() now call _preflight_user_systemd()
for the user scope first:
- Bus socket already there → no-op (desktop / linger-enabled servers)
- Linger off → try loginctl enable-linger (works when polkit permits,
needs sudo otherwise), wait for socket
- Still unreachable → raise UserSystemdUnavailableError with a clean
remediation message pointing to sudo loginctl + hermes gateway run
as the foreground fallback
Setup's start/restart handlers and gateway_command() catch the new
exception and render the multi-line guidance instead of a traceback.
Multiple custom_providers entries sharing the same base_url + api_key
are now grouped into a single picker row. A local Ollama host with
per-model display names ("Ollama — GLM 5.1", "Ollama — Qwen3-coder",
"Ollama — Kimi K2", "Ollama — MiniMax M2.7") previously produced four
near-duplicate picker rows that differed only by suffix; now it appears
as one "Ollama" row with four models.
Key changes:
- Grouping key changed from slug-by-name to (base_url, api_key). Names
frequently differ per model while the endpoint stays the same.
- When the grouped endpoint matches current_base_url, the row's slug is
set to current_provider so picker-driven switches route through the
live credential pipeline (no re-resolution needed).
- Per-model suffix is stripped from the display name ("Ollama — X" →
"Ollama") via em-dash / " - " separators.
- Two groups with different api_keys at the same base_url (or otherwise
colliding on cleaned name) are disambiguated with a numeric suffix
(custom:openai, custom:openai-2) so both stay visible.
- current_base_url parameter plumbed through both gateway call sites.
Existing #8216, #11499, #13509 regressions covered (dict/list shapes
of models:, section-3/section-4 dedup, normalized list-format entries).
Salvaged from @davidvv's PR #9210 — the underlying code had diverged
~1400 commits since that PR was opened, so this is a reconstruction of
the same approach on current main rather than a clean cherry-pick.
Authorship preserved via --author on this commit.
Closes#9210
_normalize_custom_provider_entry silently drops the models field when it's
a list. Hand-edited configs (and the shape used by older Hermes versions)
still write models as a plain list of ids, so after the normalize pass the
entry reaches list_authenticated_providers() with no models and /model
shows the provider with (0) models — even though the underlying picker
code handles lists fine.
Convert list-format models into the empty-value dict shape the rest of
the pipeline already expects. Dict-format entries keep passing through
unchanged.
Repro (before the fix):
custom_providers:
- name: acme
base_url: https://api.example.com/v1
models: [foo, bar, baz]
/model shows "acme (0)"; bypassing normalize in list_authenticated_providers
returns three models, confirming the drop happens in normalize.
Adds four unit tests covering list→dict conversion, dict pass-through,
filtering of empty/non-string entries, and the empty-list case.
When user_name is stored as None (e.g. Telegram users without a
display name), dict.get('user_name', '') returns None because the
key exists — the default is only used for missing keys. This causes
a TypeError when the format specifier :<20 is applied to None.
Use `or ''` to coerce None to an empty string.
Fixes#7392
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
In list_authenticated_providers(), providers like qwen-oauth that use
OAuth authentication were incorrectly flagged as authenticated because
the env-var check fell back to models.dev provider env vars (e.g.
DASHSCOPE_API_KEY for alibaba). Any user with an alibaba API key would
see a ghost qwen-oauth entry in /model picker with 0 models listed.
Fix: skip providers whose auth_type is not api_key in the env-var
detection section (step 1). OAuth/external-process providers are
properly handled in step 2 (HERMES_OVERLAYS) which checks the auth store.
Port from openai/codex#18646.
Adds two flags to 'hermes chat' that fully isolate a run from user-level
configuration and rules:
* --ignore-user-config: skip ~/.hermes/config.yaml and fall back to
built-in defaults. Credentials in .env are still loaded so the agent
can actually call a provider.
* --ignore-rules: skip auto-injection of AGENTS.md, SOUL.md,
.cursorrules, and persistent memory (maps to AIAgent(skip_context_files=True,
skip_memory=True)).
Primary use cases:
- Reproducible CI runs that should not pick up developer-local config
- Third-party integrations (e.g. Chronicle in Codex) that bring their
own config and don't want user preferences leaking in
- Bug-report reproduction without the reporter's personal overrides
- Debugging: bisect 'was it my config?' vs 'real bug' in one command
Both flags are registered on the parent parser AND the 'chat' subparser
(with argparse.SUPPRESS on the subparser to avoid overwriting the parent
value when the flag is placed before the subcommand, matching the
existing --yolo/--worktree/--pass-session-id pattern).
Env vars HERMES_IGNORE_USER_CONFIG=1 and HERMES_IGNORE_RULES=1 are set
by cmd_chat BEFORE 'from cli import main' runs, which is critical
because cli.py evaluates CLI_CONFIG = load_cli_config() at module import
time. The cli.py / hermes_cli.config.load_cli_config() function checks
the env var and skips ~/.hermes/config.yaml when set.
Tests: 11 new tests in tests/hermes_cli/test_ignore_user_config_flags.py
covering the env gate, constructor wiring, cmd_chat simulation, and
argparse flag registration. All pass; existing hermes_cli + cli suites
unaffected (3005 pass, 2 pre-existing unrelated failures).
When a user manually sets fallback_model as a YAML list instead of a
dict, save_config_value() crashes with:
AttributeError: 'list' object has no attribute 'get'
at the fb.get('provider') call on hermes_cli/config.py.
The fix adds isinstance(fb, dict) so list-format values are treated as
unconfigured — the fallback_model comment block is appended to guide
correct usage — instead of crashing.
Fixes#4091
Co-authored-by: [AI-assisted — Claude Sonnet 4.6 via Milo/Hermes]
The Anthropic provider entry in PROVIDER_REGISTRY is the only standard
API-key provider missing a base_url_env_var. This causes the credential
pool to hardcode base_url to https://api.anthropic.com, ignoring
ANTHROPIC_BASE_URL from the environment.
When using a proxy (e.g. LiteLLM, custom gateway), subagent delegation
fails with 401 because:
1. _seed_from_env() creates pool entries with the hardcoded base_url
2. On error recovery, _swap_credential() overwrites the child agent's
proxy URL with the pool entry's api.anthropic.com
3. The proxy API key is sent to real Anthropic → authentication_error
Adding base_url_env_var="ANTHROPIC_BASE_URL" aligns Anthropic with the
20+ other providers that already have this field set (alibaba, gemini,
deepseek, xai, etc.).
Follow-up on helix4u's PR #14211:
- Flip default to true: narrowing toolsets=['web','browser'] expresses
'I want these extras', not 'silently strip MCP'. Parent MCP tools
(registered at runtime) should survive narrowing by default.
- Drop _config_version bump (22->23); additive nested key under
delegation.* is handled by _deep_merge, no migration needed.
- Update tests to reflect new default behavior.
New and newer models from models.dev now surface automatically in
/model (both hermes model CLI and the gateway Telegram/Discord picker)
for a curated set of secondary providers — no Hermes release required
when the registry publishes a new model.
Primary user-visible fix: on OpenCode Go, typing '/model mimo-v2.5-pro'
no longer silently fuzzy-corrects to 'mimo-v2-pro'. The exact match
against the merged models.dev catalog wins.
Scope (opt-in frozenset _MODELS_DEV_PREFERRED in hermes_cli/models.py):
opencode-go, opencode-zen, deepseek, kilocode, fireworks, mistral,
togetherai, cohere, perplexity, groq, nvidia, huggingface, zai,
gemini, google.
Explicitly NOT merged:
- openrouter and nous (never): curated list is already a hand-picked
subset / Portal is source of truth.
- xai, xiaomi, minimax, minimax-cn, kimi-coding, kimi-coding-cn,
alibaba, qwen-oauth (per-project decision to keep curated-only).
- providers with dedicated live-endpoint paths (copilot, anthropic,
ai-gateway, ollama-cloud, custom, stepfun, openai-codex) — those
paths already handle freshness themselves.
Changes:
- hermes_cli/models.py: add _MODELS_DEV_PREFERRED + _merge_with_models_dev
helper. provider_model_ids() branches on the set at its curated-fallback
return. Merge is models.dev-first, curated-only extras appended,
case-insensitive dedup, graceful fallback when models.dev is offline.
- hermes_cli/model_switch.py: list_authenticated_providers() calls the
same merge in both its code paths (PROVIDER_TO_MODELS_DEV loop +
HERMES_OVERLAYS loop). Picker AND validation-fallback both see
fresh entries.
- tests/hermes_cli/test_models_dev_preferred_merge.py (new): 13 tests —
merge-helper unit tests (empty/raise/order/dedup), opencode-go/zen
behavior, openrouter+nous explicitly guarded from merge.
- tests/hermes_cli/test_opencode_go_in_model_list.py: converted from
snapshot-style assertion to a behavior-based floor check, so it
doesn't break when models.dev publishes additional opencode-go
entries.
Addresses a report from @pfanis via Telegram: newer Xiaomi variants
on OpenCode Go weren't appearing in the /model picker, and /model
was silently routing requests for new variants to older ones.
Plugin slash commands now surface as first-class commands in every gateway
enumerator — Discord native slash picker, Telegram BotCommand menu, Slack
/hermes subcommand map — without a separate per-platform plugin API.
The existing 'command:<name>' gateway hook gains a decision protocol via
HookRegistry.emit_collect(): handlers that return a dict with
{'decision': 'deny'|'handled'|'rewrite'|'allow'} can intercept slash
command dispatch before core handling runs, unifying what would otherwise
have been a parallel 'pre_gateway_command' hook surface.
Changes:
- gateway/hooks.py: add HookRegistry.emit_collect() that fires the same
handler set as emit() but collects non-None return values. Backward
compatible — fire-and-forget telemetry hooks still work via emit().
- hermes_cli/plugins.py: add optional 'args_hint' param to
register_command() so plugins can opt into argument-aware native UI
registration (Discord arg picker, future platforms).
- hermes_cli/commands.py: add _iter_plugin_command_entries() helper and
merge plugin commands into telegram_bot_commands() and
slack_subcommand_map(). New is_gateway_known_command() recognizes both
built-in and plugin commands so the gateway hook fires for either.
- gateway/platforms/discord.py: extract _build_auto_slash_command helper
from the COMMAND_REGISTRY auto-register loop and reuse it for
plugin-registered commands. Built-in name conflicts are skipped.
- gateway/run.py: before normal slash dispatch, call emit_collect on
command:<canonical> and honor deny/handled/rewrite/allow decisions.
Hook now fires for plugin commands too.
- scripts/release.py: AUTHOR_MAP entry for @Magaav.
- Tests: emit_collect semantics, plugin command surfacing per platform,
decision protocol (deny/handled/rewrite/allow + non-dict tolerance),
Discord plugin auto-registration + conflict skipping, is_gateway_known_command.
Salvaged from #14131 (@Magaav). Original PR added a parallel
'pre_gateway_command' hook and a platform-keyed plugin command
registry; this re-implementation reuses the existing 'command:<name>'
hook and treats plugin commands as platform-agnostic so the same
capability reaches Telegram and Slack without new API surface.
Co-authored-by: Magaav <73175452+Magaav@users.noreply.github.com>
Replace xiaomi/mimo-v2-pro with xiaomi/mimo-v2.5-pro and xiaomi/mimo-v2.5
in the OpenRouter fallback catalog and the nous provider model list.
Add matching DEFAULT_CONTEXT_LENGTHS entries (1M tokens each).
Adds security.allow_private_urls / HERMES_ALLOW_PRIVATE_URLS toggle so
users on OpenWrt routers, TUN-mode proxies (Clash/Mihomo/Sing-box),
corporate split-tunnel VPNs, and Tailscale networks — where DNS resolves
public domains to 198.18.0.0/15 or 100.64.0.0/10 — can use web_extract,
browser, vision URL fetching, and gateway media downloads.
Single toggle in tools/url_safety.py; all 23 is_safe_url() call sites
inherit automatically. Cached for process lifetime.
Cloud metadata endpoints stay ALWAYS blocked regardless of the toggle:
169.254.169.254 (AWS/GCP/Azure/DO/Oracle), 169.254.170.2 (AWS ECS task
IAM creds), 169.254.169.253 (Azure IMDS wire server), 100.100.100.200
(Alibaba), fd00:ec2::254 (AWS IPv6), the entire 169.254.0.0/16
link-local range, and the metadata.google.internal / metadata.goog
hostnames (checked pre-DNS so they can't be bypassed on networks where
those names resolve to local IPs).
Supersedes #3779 (narrower HERMES_ALLOW_RFC2544 for the same class of
users).
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Copilot on #14138 flagged that the share report says '(file not found)'
when the log exists but is empty (either because the primary is empty
and no .1 rotation exists, or in the rare race where the file is
truncated between _resolve_log_path() and stat()).
- Split _primary_log_path() out of _resolve_log_path so both can share
the LOG_FILES/home math without duplication.
- _capture_log_snapshot now reports '(file empty)' when the primary
path exists on disk with zero bytes, and keeps '(file not found)'
for the truly-missing case.
Tests: rename test_returns_none_for_empty → test_empty_primary_reports_file_empty
with the new assertion, plus a race-path test that monkeypatches
_resolve_log_path to exercise the size==0 branch directly.
The salvaged status-bar skin keys were seeded on the default skin, but
_build_skin_config merges default.colors into every skin — so daylight
and warm-lightmode silently inherited silver status_bar_text (#C0C0C0)
on their light backgrounds, rendering as low-contrast gray on gray.
Drop the seven status_bar_{text,strong,dim,good,warn,bad,critical}
entries from the default skin's colors and let get_prompt_toolkit_style
_overrides fall back to banner_text / banner_title / banner_dim /
ui_ok / ui_warn / ui_error. Dark skins keep their explicit overrides
and render identically; light skins now inherit their own dark banner
colors for readable status-bar text.
Drop rebased test assumptions about theme-mode helpers removed on main and keep the status bar skin integration aligned with the current skin engine model.
Route prompt_toolkit status bar colors through the skin engine so /skin updates the status bar alongside the rest of the interactive TUI.
Add regression coverage for the new status bar style override keys and CLI style composition.
These thin wrappers around _capture_log_snapshot had zero production
callers after the snapshot refactor — run_debug_share uses snapshots
directly and collect_debug_report captures internally. The wrappers
also caused a performance regression: _read_log_tail read up to 512KB
and built full_text just to return tail_text.
Remove both wrappers and migrate TestReadFullLog → TestCaptureLogSnapshot
to test _capture_log_snapshot directly. Same coverage, tests the real
API instead of dead indirection.
Adapt the byte-boundary-safe truncation fix from PR #14040 by
taosiyuan163 into the new _capture_log_snapshot() code path: when
the truncation cut lands exactly on a line boundary, keep the first
retained line instead of unconditionally dropping it.
Also add a 2x max_bytes safety cap to the backward-reading loop to
prevent unbounded memory consumption when log files contain very long
lines (e.g. JSON blobs) with few newlines.
Based on #14040 by @taosiyuan163.
* fix(plugins): auto-coerce user-installed memory plugins to kind=exclusive
User-installed memory provider plugins at $HERMES_HOME/plugins/<name>/
were being dispatched to the general PluginManager, which has no
register_memory_provider method on PluginContext. Every startup logged:
Failed to load plugin 'mempalace': 'PluginContext' object has no
attribute 'register_memory_provider'
Bundled memory providers were already skipped via skip_names={memory,
context_engine} in discover_and_load, but user-installed ones weren't.
Fix: _parse_manifest now scans the plugin's __init__.py source for
'register_memory_provider' or 'MemoryProvider' (same heuristic as
plugins/memory/__init__.py:_is_memory_provider_dir) and auto-coerces
kind to 'exclusive' when the manifest didn't declare one explicitly.
This routes the plugin to plugins/memory discovery instead of the
general loader.
The escape hatch: if a manifest explicitly declares kind: standalone,
the heuristic doesn't override it.
Reported by Uncle HODL on Discord.
* fix(nous): actionable CLI message when Nous 401 refresh fails
Mirrors the Anthropic 401 diagnostic pattern. When Nous returns 401
and the credential refresh (_try_refresh_nous_client_credentials)
also fails, the user used to see only the raw APIError. Now prints:
🔐 Nous 401 — Portal authentication failed.
Response: <truncated body>
Most likely: Portal OAuth expired, account out of credits, or
agent key revoked.
Troubleshooting:
• Re-authenticate: hermes login --provider nous
• Check credits / billing: https://portal.nousresearch.com
• Verify stored credentials: $HERMES_HOME/auth.json
• Switch providers temporarily: /model <model> --provider openrouter
Addresses the common 'my hermes model hangs' pattern where the user's
Portal OAuth expired and the CLI gave no hint about the next step.
Adds schema v7 'api_call_count' column. run_agent.py increments it by 1
per LLM API call, web_server analytics SQL aggregates it, frontend uses
the real counter instead of summing sessions.
The 'API Calls' card on the analytics dashboard previously displayed
COUNT(*) from the sessions table — the number of conversations, not
LLM requests. Each session makes 10-90 API calls through the tool loop,
so the reported number was ~30x lower than real.
Salvaged from PR #10140 (@kshitijk4poor). The cache-token accuracy
portions of the original PR were deferred — per-provider analytics is
the better path there, since cache_write_tokens and actual_cost_usd
are only reliably available from a subset of providers (Anthropic
native, Codex Responses, OpenRouter with usage.include).
Tests:
- schema_version v7 assertion
- migration v2 -> v7 adds api_call_count column with default 0
- update_token_counts increments api_call_count by provided delta
- absolute=True sets api_call_count directly
- /api/analytics/usage exposes total_api_calls in totals
- Replace async create_bind_task/poll_bind_result with synchronous
httpx.Client equivalents, eliminating manual event loop management
- Move _render_qr and full qr_register() entry-point into onboard.py,
mirroring the Feishu onboarding pattern
- Remove _qqbot_render_qr and _qqbot_qr_flow from gateway.py (~90 lines);
call site becomes a single qr_register() import
- Fix potential segfault: previous code called loop.close() in the EXPIRED
branch and again in the finally block (double-close crashed under uvloop)
* feat(state): auto-prune old sessions + VACUUM state.db at startup
state.db accumulates every session, message, and FTS5 index entry forever.
A heavy user (gateway + cron) reported 384MB with 982 sessions / 68K messages
causing slowdown; manual 'hermes sessions prune --older-than 7' + VACUUM
brought it to 43MB. The prune command and VACUUM are not wired to run
automatically anywhere — sessions grew unbounded until users noticed.
Changes:
- hermes_state.py: new state_meta key/value table, vacuum() method, and
maybe_auto_prune_and_vacuum() — idempotent via last-run timestamp in
state_meta so it only actually executes once per min_interval_hours
across all Hermes processes for a given HERMES_HOME. Never raises.
- hermes_cli/config.py: new 'sessions:' block in DEFAULT_CONFIG
(auto_prune=True, retention_days=90, vacuum_after_prune=True,
min_interval_hours=24). Added to _KNOWN_ROOT_KEYS.
- cli.py: call maintenance once at HermesCLI init (shared helper
_run_state_db_auto_maintenance reads config and delegates to DB).
- gateway/run.py: call maintenance once at GatewayRunner init.
- Docs: user-guide/sessions.md rewrites 'Automatic Cleanup' section.
Why VACUUM matters: SQLite does NOT shrink the file on DELETE — freed
pages get reused on next INSERT. Without VACUUM, a delete-heavy DB stays
bloated forever. VACUUM only runs when the prune actually removed rows,
so tight DBs don't pay the I/O cost.
Tests: 10 new tests in tests/test_hermes_state.py covering state_meta,
vacuum, idempotency, interval skipping, VACUUM-only-when-needed,
corrupt-marker recovery. All 246 existing state/config/gateway tests
still pass.
Verified E2E with real imports + isolated HERMES_HOME: DEFAULT_CONFIG
exposes the new block, load_config() returns it for fresh installs,
first call prunes+vacuums, second call within min_interval_hours skips,
and the state_meta marker persists across connection close/reopen.
* sessions.auto_prune defaults to false (opt-in)
Session history powers session_search recall across past conversations,
so silently pruning on startup could surprise users. Ship the machinery
disabled and let users opt in when they notice state.db is hurting
performance.
- DEFAULT_CONFIG.sessions.auto_prune: True → False
- Call-site fallbacks in cli.py and gateway/run.py match the new default
(so unmigrated configs still see off)
- Docs: flip 'Enable in config.yaml' framing + tip explains the tradeoff
Adds a first-class 'stepfun' API-key provider surfaced as Step Plan:
- Support Step Plan setup for both International and China regions
- Discover Step Plan models live from /step_plan/v1/models, with a
small coding-focused fallback catalog when discovery is unavailable
- Thread StepFun through provider metadata, setup persistence, status
and doctor output, auxiliary routing, and model normalization
- Add tests for provider resolution, model validation, metadata
mapping, and StepFun region/model persistence
Based on #6005 by @hengm3467.
Co-authored-by: hengm3467 <100685635+hengm3467@users.noreply.github.com>
* feat(plugins): pluggable image_gen backends + OpenAI provider
Adds a ImageGenProvider ABC so image generation backends register as
bundled plugins under `plugins/image_gen/<name>/`. The plugin scanner
gains three primitives to make this work generically:
- `kind:` manifest field (`standalone` | `backend` | `exclusive`).
Bundled `kind: backend` plugins auto-load — no `plugins.enabled`
incantation. User-installed backends stay opt-in.
- Path-derived keys: `plugins/image_gen/openai/` gets key
`image_gen/openai`, so a future `tts/openai` cannot collide.
- Depth-2 recursion into category namespaces (parent dirs without a
`plugin.yaml` of their own).
Includes `OpenAIImageGenProvider` as the first consumer (gpt-image-1.5
default, plus gpt-image-1, gpt-image-1-mini, DALL-E 3/2). Base64
responses save to `$HERMES_HOME/cache/images/`; URL responses pass
through.
FAL stays in-tree for this PR — a follow-up ports it into
`plugins/image_gen/fal/` so the in-tree `image_generation_tool.py`
slims down. The dispatch shim in `_handle_image_generate` only fires
when `image_gen.provider` is explicitly set to a non-FAL value, so
existing FAL setups are untouched.
- 41 unit tests (scanner recursion, kind parsing, gate logic,
registry, OpenAI payload shapes)
- E2E smoke verified: bundled plugin autoloads, registers, and
`_handle_image_generate` routes to OpenAI when configured
* fix(image_gen/openai): don't send response_format to gpt-image-*
The live API rejects it: 'Unknown parameter: response_format'
(verified 2026-04-21 with gpt-image-1.5). gpt-image-* models return
b64_json unconditionally, so the parameter was both unnecessary and
actively broken.
* feat(image_gen/openai): gpt-image-2 only, drop legacy catalog
gpt-image-2 is the latest/best OpenAI image model (released 2026-04-21)
and there's no reason to expose the older gpt-image-1.5 / gpt-image-1 /
dall-e-3 / dall-e-2 alongside it — slower, lower quality, or awkward
(dall-e-2 squares only). Trim the catalog down to a single model.
Live-verified end-to-end: landscape 1536x1024 render of a Moog-style
synth matches prompt exactly, 2.4MB PNG saved to cache.
* feat(image_gen/openai): expose gpt-image-2 as three quality tiers
Users pick speed/fidelity via the normal model picker instead of a
hidden quality knob. All three tier IDs resolve to the single underlying
gpt-image-2 API model with a different quality parameter:
gpt-image-2-low ~15s fast iteration
gpt-image-2-medium ~40s default
gpt-image-2-high ~2min highest fidelity
Live-measured on OpenAI's API today: 15.4s / 40.8s / 116.9s for the
same 1024x1024 prompt.
Config:
image_gen.openai.model: gpt-image-2-high
# or
image_gen.model: gpt-image-2-low
# or env var for scripts/tests
OPENAI_IMAGE_MODEL=gpt-image-2-medium
Live-verified end-to-end with the low tier: 18.8s landscape render of a
golden retriever in wildflowers, vision-confirmed exact match.
* feat(tools_config): plugin image_gen providers inject themselves into picker
'hermes tools' → Image Generation now shows plugin-registered backends
alongside Nous Subscription and FAL.ai without tools_config.py needing
to know about them. OpenAI appears as a third option today; future
backends appear automatically as they're added.
Mechanism:
- ImageGenProvider gains an optional get_setup_schema() hook
(name, badge, tag, env_vars). Default derived from display_name.
- tools_config._plugin_image_gen_providers() pulls the schemas from
every registered non-FAL plugin provider.
- _visible_providers() appends those rows when rendering the Image
Generation category.
- _configure_provider() handles the new image_gen_plugin_name marker:
writes image_gen.provider and routes to the plugin's list_models()
catalog for the model picker.
- _toolset_needs_configuration_prompt('image_gen') stops demanding a
FAL key when any plugin provider reports is_available().
FAL is skipped in the plugin path because it already has hardcoded
TOOL_CATEGORIES rows — when it gets ported to a plugin in a follow-up
PR the hardcoded rows go away and it surfaces through the same path
as OpenAI.
Verified live: picker shows Nous Subscription / FAL.ai / OpenAI.
Picking OpenAI prompts for OPENAI_API_KEY, then shows the
gpt-image-2-low/medium/high model picker sourced from the plugin.
397 tests pass across plugins/, tools_config, registry, and picker.
* fix(image_gen): close final gaps for plugin-backend parity with FAL
Two small places that still hardcoded FAL:
- hermes_cli/setup.py status line: an OpenAI-only setup showed
'Image Generation: missing FAL_KEY'. Now probes plugin providers
and reports '(OpenAI)' when one is_available() — or falls back to
'missing FAL_KEY or OPENAI_API_KEY' if nothing is configured.
- image_generate tool schema description: said 'using FAL.ai, default
FLUX 2 Klein 9B'. Rewrote provider-neutral — 'backend and model are
user-configured' — and notes the 'image' field can be a URL or an
absolute path, which the gateway delivers either way via
extract_local_files().
Surfaces the free variant alongside the paid minimax-m2.5 entry in
both the OPENROUTER_MODELS fallback snapshot and the nous/openrouter
provider model list.
Remove nvidia/nemotron-3-super-120b-a12b:free, arcee-ai/trinity-large-preview:free,
and openrouter/elephant-alpha from _PROVIDER_MODELS['nous']. The paid nemotron and
arcee-thinking variants remain.
Wire the auxiliary client (compaction, vision, session search, web extract)
to the Nous Portal's curated recommended-models endpoint when running on
Nous Portal, with a TTL-cached fetch that mirrors how we pull /models for
pricing.
hermes_cli/models.py
- fetch_nous_recommended_models(portal_base_url, force_refresh=False)
10-minute TTL cache, keyed per portal URL (staging vs prod don't
collide). Public endpoint, no auth required. Returns {} on any
failure so callers always get a dict.
- get_nous_recommended_aux_model(vision, free_tier=None, ...)
Tier-aware pick from the payload:
- Paid tier → paidRecommended{Vision,Compaction}Model, falling back
to freeRecommended* when the paid field is null (common during
staged rollouts of new paid models).
- Free tier → freeRecommended* only, never leaks paid models.
When free_tier is None, auto-detects via the existing
check_nous_free_tier() helper (already cached 3 min against
/api/oauth/account). Detection errors default to paid so we never
silently downgrade a paying user.
agent/auxiliary_client.py — _try_nous()
- Replaces the hardcoded xiaomi/mimo free-tier branch with a single call
to get_nous_recommended_aux_model(vision=vision).
- Falls back to _NOUS_MODEL (google/gemini-3-flash-preview) when the
Portal is unreachable or returns a null recommendation.
- The Portal is now the source of truth for aux model selection; the
xiaomi allowlist we used to carry is effectively dead.
Tests (15 new)
- tests/hermes_cli/test_models.py::TestNousRecommendedModels
Fetch caching, per-portal keying, network failure, force_refresh;
paid-prefers-paid, paid-falls-to-free, free-never-leaks-paid,
auto-detect, detection-error → paid default, null/blank modelName
handling.
- tests/agent/test_auxiliary_client.py::TestNousAuxiliaryRefresh
_try_nous honors Portal recommendation for text + vision, falls
back to google/gemini-3-flash-preview on None or exception.
Behavior won't visibly change today — both tier recommendations currently
point at google/gemini-3-flash-preview — but the moment the Portal ships
a better paid recommendation, subscribers pick it up within 10 minutes
without a Hermes release.
Drop _NOUS_ALLOWED_FREE_MODELS + filter_nous_free_models and its two call
sites. Whatever Nous Portal prices as free now shows up in the picker as-is
— no local allowlist gatekeeping. Free-tier partitioning (paid vs free in
the menu) still runs via partition_nous_models_by_tier.
Follow-ups after salvaging xiaoqiang243's kimi-for-coding patches:
- KIMI_CODE_BASE_URL: drop trailing /v1 (was /coding/v1).
The /coding endpoint speaks Anthropic Messages, and the Anthropic SDK
appends /v1/messages internally. /coding/v1 + SDK suffix produced
/coding/v1/v1/messages (a 404). /coding + SDK suffix now yields
/coding/v1/messages correctly.
- kimi-coding ProviderConfig: keep legacy default api.moonshot.ai/v1 so
non-sk-kimi- moonshot keys still authenticate. sk-kimi- keys are
already redirected to api.kimi.com/coding via _resolve_kimi_base_url.
- doctor.py: update Kimi UA to claude-code/0.1.0 (was KimiCLI/1.30.0)
and rewrite /coding base URLs to /coding/v1 for the /models health
check (Anthropic surface has no /models).
- test_kimi_env_vars: accept KIMI_CODING_API_KEY as a secondary env var.
E2E verified:
sk-kimi-<key> → https://api.kimi.com/coding/v1/messages (Anthropic)
sk-<legacy> → https://api.moonshot.ai/v1/chat/completions (OpenAI)
UA: claude-code/0.1.0, x-api-key: <sk-kimi-*>