New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
Remove hallucinated providers (openai, deepseek, together, groq,
fireworks, mistral, gemini, nous) from the fallback provider map.
These don't exist in hermes-agent's provider system.
The real supported providers for fallback are:
openrouter (OPENROUTER_API_KEY)
zai (ZAI_API_KEY)
kimi-coding (KIMI_API_KEY)
minimax (MINIMAX_API_KEY)
minimax-cn (MINIMAX_CN_API_KEY)
For any other OpenAI-compatible endpoint, users can use the
base_url + api_key_env overrides in the config.
Also adds Kimi User-Agent header for kimi fallback (matching
the main provider system).
When the primary model/provider fails after retries (rate limit, overload,
auth errors, connection failures), Hermes automatically switches to a
configured fallback model for the remainder of the session.
Config (in ~/.hermes/config.yaml):
fallback_model:
provider: openrouter
model: anthropic/claude-sonnet-4
Supports all major providers: OpenRouter, OpenAI, Nous, DeepSeek, Together,
Groq, Fireworks, Mistral, Gemini — plus custom endpoints via base_url and
api_key_env overrides.
Design principles:
- Dead simple: one fallback model, not a chain
- One-shot: switches once, doesn't ping-pong back
- Zero new dependencies: uses existing OpenAI client
- Minimal code: ~100 lines in run_agent.py, ~5 lines in cli.py/gateway
- Three trigger points: max retries exhausted, non-retryable client errors,
and invalid response exhaustion
Does NOT trigger on context overflow or payload-too-large errors (those
are handled by the existing compression system).
Addresses #737.
25 new tests, 2492 total passing.
Complete Signal adapter using signal-cli daemon HTTP API.
Based on PR #268 by ibhagwan, rebuilt on current main with bug fixes.
Architecture:
- SSE streaming for inbound messages with exponential backoff (2s→60s)
- JSON-RPC 2.0 for outbound (send, typing, attachments, contacts)
- Health monitor detects stale SSE connections (120s threshold)
- Phone number redaction in all logs and global redact.py
Features:
- DM and group message support with separate access policies
- DM policies: pairing (default), allowlist, open
- Group policies: disabled (default), allowlist, open
- Attachment download with magic-byte type detection
- Typing indicators (8s refresh interval)
- 100MB attachment size limit, 8000 char message limit
- E.164 phone + UUID allowlist support
Integration:
- Platform.SIGNAL enum in gateway/config.py
- Signal in _is_user_authorized() allowlist maps (gateway/run.py)
- Adapter factory in _create_adapter() (gateway/run.py)
- user_id_alt/chat_id_alt fields in SessionSource for UUIDs
- send_message tool support via httpx JSON-RPC (not aiohttp)
- Interactive setup wizard in 'hermes gateway setup'
- Connectivity testing during setup (pings /api/v1/check)
- signal-cli detection and install guidance
Bug fixes from PR #268:
- Timestamp reads from envelope_data (not outer wrapper)
- Uses httpx consistently (not aiohttp in send_message tool)
- SIGNAL_DEBUG scoped to signal logger (not root)
- extract_images regex NOT modified (preserves group numbering)
- pairing.py NOT modified (no cross-platform side effects)
- No dual authorization (adapter defers to run.py for user auth)
- Wildcard uses set membership ('*' in set, not list equality)
- .zip default for PK magic bytes (not .docx)
No new Python dependencies — uses httpx (already core).
External requirement: signal-cli daemon (user-installed).
Tests: 30 new tests covering config, init, helpers, session source,
phone redaction, authorization, and send_message integration.
Co-authored-by: ibhagwan <ibhagwan@users.noreply.github.com>
The gateway had a SEPARATE compression system ('session hygiene')
with hardcoded thresholds (100k tokens / 200 messages) that were
completely disconnected from the model's context length and the
user's compression config in config.yaml. This caused premature
auto-compression on Telegram/Discord — triggering at ~60k tokens
(from the 200-message threshold) or inconsistent token counts.
Changes:
- Gateway hygiene now reads model name from config.yaml and uses
get_model_context_length() to derive the actual context limit
- Compression threshold comes from compression.threshold in
config.yaml (default 0.85), same as the agent's ContextCompressor
- Removed the message-count-based trigger (was redundant and caused
false positives in tool-heavy sessions)
- Removed the undocumented session_hygiene config section — the
standard compression.* config now controls everything
- Env var overrides (CONTEXT_COMPRESSION_THRESHOLD,
CONTEXT_COMPRESSION_ENABLED) are respected
- Warn threshold is now 95% of model context (was hardcoded 200k)
- Updated tests to verify model-aware thresholds, scaling across
models, and that message count alone no longer triggers compression
For claude-opus-4.6 (200k context) at 85% threshold: gateway
hygiene now triggers at 170k tokens instead of the old 100k.
The 'openai' provider was redundant — using OPENAI_BASE_URL +
OPENAI_API_KEY with provider: 'main' already covers direct OpenAI API.
Provider options are now: auto, openrouter, nous, codex, main.
- Removed _try_openai(), _OPENAI_AUX_MODEL, _OPENAI_BASE_URL
- Replaced openai tests with codex provider tests
- Updated all docs to remove 'openai' option and clarify 'main'
- 'main' description now explicitly mentions it works with OpenAI API,
local models, and any OpenAI-compatible endpoint
Tests: 2467 passed.
The Codex Responses API (chatgpt.com/backend-api/codex) supports
vision via gpt-5.3-codex. This was verified with real API calls
using image analysis.
Changes to _CodexCompletionsAdapter:
- Added _convert_content_for_responses() to translate chat.completions
multimodal format to Responses API format:
- {type: 'text'} → {type: 'input_text'}
- {type: 'image_url', image_url: {url: '...'}} → {type: 'input_image', image_url: '...'}
- Fixed: removed 'stream' from resp_kwargs (responses.stream() handles it)
- Fixed: removed max_output_tokens and temperature (Codex endpoint rejects them)
Provider changes:
- Added 'codex' as explicit auxiliary provider option
- Vision auto-fallback now includes Codex (OpenRouter → Nous → Codex)
since gpt-5.3-codex supports multimodal input
- Updated docs with Codex OAuth examples
Tested with real Codex OAuth token + ~/.hermes/image2.png — confirmed
working end-to-end through the full adapter pipeline.
Tests: 2459 passed.
The Codex model normalization was rejecting any model without 'codex'
in its name, forcing a fallback to gpt-5.3-codex. This blocked models
like gpt-5.4 that the Codex API actually supports.
The fix simplifies _normalize_model_for_provider() to two operations:
1. Strip provider prefixes (API needs bare slugs)
2. Replace the *untouched default* model with a Codex-compatible one
If the user explicitly chose a model — any model — we trust them and
let the API be the judge. No allowlists, no slug checks.
Also removes the 'codex not in slug' filter from _read_cache_models()
so the local cache preserves all API-available models.
Inspired by OpenClaw's approach which explicitly lists non-codex models
(gpt-5.4, gpt-5.2) as valid Codex models.
Users can now set provider: "openai" for auxiliary tasks (vision, web
extract, compression) to use OpenAI's API directly with their
OPENAI_API_KEY. This hits api.openai.com/v1 with gpt-4o-mini as the
default model — supports vision since GPT-4o handles image input.
Provider options are now: auto, openrouter, nous, openai, main.
Changes:
- agent/auxiliary_client.py: added _try_openai(), "openai" case in
_resolve_forced_provider(), updated auxiliary_max_tokens_param()
to use max_completion_tokens for OpenAI
- Updated docs: cli-config.yaml.example, AGENTS.md, and user-facing
configuration.md with Common Setups section showing OpenAI,
OpenRouter, and local model examples
- 3 new tests for OpenAI provider resolution
Tests: 2459 passed (was 2429).
Improvements on top of PR #606 (auxiliary model configuration):
1. Gateway bridge: Added auxiliary.* and compression.summary_provider
config bridging to gateway/run.py so config.yaml settings work from
messaging platforms (not just CLI). Matches the pattern in cli.py.
2. Vision auto-fallback safety: In auto mode, vision now only tries
OpenRouter + Nous Portal (known multimodal-capable providers).
Custom endpoints, Codex, and API-key providers are skipped to avoid
confusing errors from providers that don't support vision input.
Explicit provider override (AUXILIARY_VISION_PROVIDER=main) still
allows using any provider.
3. Comprehensive tests (46 new):
- _get_auxiliary_provider env var resolution (8 tests)
- _resolve_forced_provider with all provider types (8 tests)
- Per-task provider routing integration (4 tests)
- Vision auto-fallback safety (7 tests)
- Config bridging logic (11 tests)
- Gateway/CLI bridge parity (2 tests)
- Vision model override via env var (2 tests)
- DEFAULT_CONFIG shape validation (4 tests)
4. Docs: Added auxiliary_client.py to AGENTS.md project structure.
Updated module docstring with separate text/vision resolution chains.
Tests: 2429 passed (was 2383).
- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.
Add contextual [Hint: ...] suffixes to tool results where they save
real iterations:
- patch (no match): suggests read_file/search_files to verify content
before retrying — addresses the common pattern where the agent retries
with stale old_string instead of re-reading the file.
- search_files (truncated): provides explicit next offset and suggests
narrowing the search — clearer than relying on total_count inference.
Other hints proposed in #722 (terminal, web_search, web_extract,
browser_snapshot, search zero-results, search content-matches) were
evaluated and found to be low-value: either already covered by existing
mechanisms (read_file pagination, similar-files, schema descriptions)
or guidance the agent already follows from its own reasoning.
5 new tests covering hint presence/absence for both tools.
When resuming a session via --continue or --resume, show a compact recap
of the previous conversation inside a Rich panel before the input prompt.
This gives users immediate visual context about what was discussed.
Changes:
- Add _preload_resumed_session() to load session history early (in run(),
before banner) so _init_agent() doesn't need a separate DB round-trip
- Add _display_resumed_history() that renders a formatted recap panel:
* User messages shown with gold bullet (truncated at 300 chars)
* Assistant responses shown with green diamond (truncated at 200 chars / 3 lines)
* Tool calls collapsed to count + tool names
* System messages and tool results hidden
* <REASONING_SCRATCHPAD> blocks stripped from display
* Pure-reasoning messages (no visible output) skipped entirely
* Capped at last 10 exchanges with 'N earlier messages' indicator
* Dim/muted styling distinguishes recap from active conversation
- Add display.resume_display config option: 'full' (default) or 'minimal'
- Store resume_display as instance variable (like compact) for testability
- 27 new tests covering all display scenarios, config, and edge cases
Closes#719
NOUS_API_KEY is unused — vision tools use OPENROUTER_API_KEY or Nous
Portal OAuth (auth.json), and MoA tools use OPENROUTER_API_KEY.
Removed from:
- hermes_cli/config.py: api_keys allowlist for config set routing
- .env.example: example env file entry and comment
- tests/hermes_cli/test_set_config_value.py: parametrize test data
- tests/integration/test_web_tools.py: updated comments and log
messages to reference 'auxiliary LLM provider' instead of NOUS_API_KEY
No HECATE references found in codebase (already cleaned up).
Add `hermes sessions browse` — a curses-based interactive session picker
with live type-to-search filtering, arrow key navigation, and seamless
session resume via Enter.
Features:
- Arrow keys to navigate, Enter to select and resume, Esc/q to quit
- Type characters to live-filter sessions by title, preview, source, or ID
- Backspace to edit filter, first Esc clears filter, second Esc exits
- Adaptive column layout (title/preview, last active, source, ID)
- Scrolling support for long session lists
- --source flag to filter by platform (cli, telegram, discord, etc.)
- --limit flag to control how many sessions to load (default: 50)
- Windows fallback: numbered list with input prompt
- After selection, seamlessly execs into `hermes --resume <id>`
Design decisions:
- Separate subcommand (not a flag on -c) — preserves `hermes -c` as-is
for instant most-recent-session resume
- Uses curses (not simple_term_menu) per Known Pitfalls to avoid the
arrow-key ghost-duplication rendering bug in tmux/iTerm
- Follows existing curses pattern from hermes_cli/tools_config.py
Also fixes: removed redundant `import os` inside cmd_sessions stats
block that shadowed the module-level import (would cause UnboundLocalError
if browse action was taken in the same function).
Tests: 33 new tests covering curses picker, fallback mode, filtering,
navigation, edge cases, and argument parser registration.
Verifies that setup.py imports the correct function name
(get_codex_model_ids) from codex_models.py. This would have caught
the ImportError bug before it reached users.
Source code (hermes_cli/clipboard.py):
- _convert_to_png() lost the file when both Pillow and ImageMagick were
unavailable: path.rename(tmp) moved the file to .bmp, then subprocess.run
raised FileNotFoundError, but the file was never renamed back. The final
fallback 'return path.exists()' returned False.
- Fix: restore the original file in both except handlers by renaming tmp
back to path when the original is missing.
Test (tests/tools/test_clipboard.py):
- test_file_still_usable_when_no_converter expected 'from PIL import Image'
to raise an Exception, but Pillow is installed so pytest.raises fired
'DID NOT RAISE'. The test also never called _convert_to_png().
- Fix: properly mock PIL unavailability via patch.dict(sys.modules),
actually call _convert_to_png(), and assert the correct result.
Messaging users can now switch back to previously-named sessions:
- /resume My Project — resolves the title (with auto-lineage) and
restores that session's conversation history
- /resume (no args) — lists recent titled sessions to choose from
Adds SessionStore.switch_session() which ends the current session and
points the session entry at the target session ID so the old transcript
is loaded on the next message. Running agents are cleared on switch.
Completes the session naming feature from PR #720 for gateway users.
8 new tests covering: name resolution, lineage auto-latest, already-on-
session check, nonexistent names, agent cleanup, no-DB fallback, and
listing titled sessions.
When _ensure_runtime_credentials() resolves the provider to openai-codex,
check if the active model is Codex-compatible. If not (e.g. the default
anthropic/claude-opus-4.6), swap it for the best available Codex model.
Also strips provider prefixes the Codex API rejects (openai/gpt-5.3-codex
→ gpt-5.3-codex).
Adds _model_is_default flag so warnings are only shown when the user
explicitly chose an incompatible model (not when it's the config default).
Fixes#651.
Co-inspired-by: stablegenius49 (PR #661)
Co-inspired-by: teyrebaz33 (PR #696)
Previously, search_files would silently return 0 results when the
search path didn't exist (e.g., /root/.hermes/... when HOME is
/home/user). The path was passed to rg/grep/find which would fail
silently, and the empty stdout was parsed as 'no matches found'.
Changes:
- Add path existence check at the top of search() using test -e.
Returns SearchResult with a clear error message when path doesn't exist.
- Add exit code 2 checks in _search_with_rg() and _search_with_grep()
as secondary safety net for other error types (bad regex, permissions).
- Add 4 new tests covering: nonexistent path (content mode), nonexistent
path (files mode), existing path proceeds normally, rg error exit code.
Tests: 37 → 41 in test_file_operations.py, full suite 2330 passed.
- Empty string titles normalized to None (prevents uncaught IntegrityError
when two sessions both get empty-string titles via the unique index)
- Escape SQL LIKE wildcards (%, _) in resolve_session_by_title and
get_next_title_in_lineage to prevent false matches on titles like
'test_project' matching 'testXproject #2'
- Optimize list_sessions_rich from N+2 queries to a single query with
correlated subqueries (preview + last_active computed in SQL)
- Add /title slash command to gateway (Telegram, Discord, Slack, WhatsApp)
with set and show modes, uniqueness conflict handling
- Add /title to gateway /help text and _known_commands
- 12 new tests: empty string normalization, multi-empty-title safety,
SQL wildcard edge cases, gateway /title set/show/conflict/cross-platform
Two 'except Exception: pass' blocks silently hide failures:
- mirror_to_session failure: user's message never gets mirrored, no trace
- config.yaml parse failure: wrong model used silently
Replace with logger.warning so failures are visible in logs.
Completed/cancelled items are now filtered from format_for_injection()
output. Update the existing test to verify active items appear and
completed items are excluded.
- Block file reads after 3+ re-reads of same region (no content returned)
- Track search_files calls and block repeated identical searches
- Filter completed/cancelled todos from post-compression injection
to prevent agent from re-doing finished work
- Add 10 new tests covering all three fixes
When context compression summarizes conversation history, the agent
loses track of which files it already read and re-reads them in a loop.
Users report the agent reading the same files endlessly without writing.
Root cause: context compression is lossy — file contents and read history
are lost in the summary. After compression, the model thinks it hasn't
examined the files yet and reads them again.
Fix (two-part):
1. Track file reads per task in file_tools.py. When the same file region
is read again, include a _warning in the response telling the model
to stop re-reading and use existing information.
2. After context compression, inject a structured message listing all
files already read in the session with explicit "do NOT re-read"
instruction, preserving read history across compression boundaries.
Adds 16 tests covering warning detection, task isolation, summary
accuracy, tracker cleanup, and compression history injection.
Call ensure_hub_dirs() at the start of hermes skills list so the\nSkills Hub directory structure is created before reading hub\nmetadata.\n\nAdd a regression test covering the empty-home path where\ndoctor recommends running the list command.\n\nRefs: #703
Images pasted in the CLI were embedded as raw base64 image_url content
parts in the conversation history, which only works with vision-capable
models. If the main model (e.g. Nous API) doesn't support vision, this
breaks the request and poisons all subsequent messages.
Now the CLI uses the same approach as the messaging gateway: images are
pre-processed through the auxiliary vision model (Gemini Flash via
OpenRouter or Nous Portal) and converted to text descriptions. The
local file path is included so the agent can re-examine via
vision_analyze if needed. Works with any model.
Fixes#638.
User messaging improvements:
- Rejection: '(>_<) Error: not a valid model' instead of '(^_^) Warning: Error:'
- Rejection: shows 'Model unchanged' + tip about /model and /provider
- Session-only: explains 'this session only' with reason and 'will revert on restart'
- Saved: clear '(saved to config)' confirmation
Docs updated:
- cli-commands.md, cli.md, messaging/index.md: /model now shows
provider:model syntax, /provider command added to tables
Test fixes: deduplicated test names, assertions match new messages.
/provider command (CLI + gateway):
Shows all providers with auth status (✓/✗), aliases, and active marker.
Users can now discover what provider names work with provider:model syntax.
Gateway bugs fixed:
- Config was saved even when validation.persist=False (told user 'session
only' but actually persisted the unvalidated model)
- HERMES_INFERENCE_PROVIDER env var not set on provider switch, causing
the switch to be silently overridden if that env var was already set
parse_model_input hardened:
- Colon only treated as provider delimiter if left side is a recognized
provider name or alias. 'anthropic/claude-3.5-sonnet:beta' now passes
through as a model name instead of trying provider='anthropic/claude-3.5-sonnet'.
- HTTP URLs, random colons no longer misinterpreted.
56 tests passing across model validation, CLI commands, and integration.
Add provider:model syntax to /model command for runtime provider switching:
/model zai:glm-5 → switch to Z.AI provider with glm-5
/model nous:hermes-3 → switch to Nous Portal with hermes-3
/model openrouter:anthropic/claude-sonnet-4.5 → explicit OpenRouter
When switching providers, credentials are resolved via resolve_runtime_provider
and validated before committing. Both model and provider are saved to config.
Provider aliases work (glm: → zai, kimi: → kimi-coding, etc.).
Enhanced /model (no args) display now shows:
- Current model and provider
- Curated model list for the current provider with ← marker
- Usage examples including provider:model syntax
39 tests covering parse_model_input, curated_models_for_provider,
provider switching (success + credential failure), and display output.
The 200 lines of prompt_toolkit/rich/fire stubs added in PR #650 were
guarded by 'if module in sys.modules: return' and never activated since
those dependencies are always installed. Removed to keep the test file
lean. Also removed unused MagicMock and pytest imports.
Not all providers require 'provider/model' format. Removing the rigid
format check lets the live API probe handle all validation uniformly.
If someone types 'gpt-5.4' on OpenRouter, the probe won't find it and
will suggest 'openai/gpt-5.4' — better UX than a format rejection.
Replace the static catalog-based model validation with a live API probe.
The /model command now hits the provider's /models endpoint to check if
the requested model actually exists:
- Model found in API → accepted + saved to config
- Model NOT found in API → rejected with 'Error: not a valid model'
and fuzzy-match suggestions from the live model list
- API unreachable → graceful fallback to hardcoded catalog (session-only
for unrecognized models)
- Format errors (empty, spaces, missing '/') still caught instantly
without a network call
The API probe takes ~0.2s for OpenRouter (346 models) and works with any
OpenAI-compatible endpoint (Ollama, vLLM, custom, etc.).
32 tests covering all paths: format checks, API found, API not found,
API unreachable fallback, CLI integration.
- Wrap validate_requested_model in try/except so /model doesn't crash
if validation itself fails (falls back to old accept+save behavior)
- Remove unnecessary sys.path.insert from both test files
- Expand test_model_validation.py: 4 → 23 tests covering normalize_provider,
provider_model_ids, empty/whitespace/spaces rejection, OpenRouter format
validation, custom endpoints, nous provider, provider aliases, unknown
providers, fuzzy suggestions
- Expand test_cli_model_command.py: 2 → 5 tests adding known-model save,
validation crash fallback, and /model with no argument
Skills can now declare runtime prerequisites (env vars, CLI binaries) via
YAML frontmatter. Skills with unmet prerequisites are excluded from the
system prompt so the agent never claims capabilities it can't deliver, and
skill_view() warns the agent about what's missing.
Three layers of defense:
- build_skills_system_prompt() filters out unavailable skills
- _find_all_skills() flags unmet prerequisites in metadata
- skill_view() returns prerequisites_warning with actionable details
Tagged 12 bundled skills that have hard runtime dependencies:
gif-search (TENOR_API_KEY), notion (NOTION_API_KEY), himalaya, imessage,
apple-notes, apple-reminders, openhue, duckduckgo-search, codebase-inspection,
blogwatcher, songsee, mcporter.
Closes#658Fixes#630
Removed the hard block on base_url containing 'api.anthropic.com'.
Anthropic now offers an OpenAI-compatible /chat/completions endpoint,
so blocking their URL prevents legitimate use. If the endpoint isn't
compatible, the API call will fail with a proper error anyway.
Removed from: run_agent.py, mini_swe_runner.py
Updated test to verify Anthropic URLs are accepted.
browser_vision now saves screenshots persistently to ~/.hermes/browser_screenshots/
and returns the screenshot_path in its JSON response. The model can include
MEDIA:<path> in its response to share screenshots as native photos.
Changes:
- browser_tool.py: Save screenshots persistently, return screenshot_path,
auto-cleanup files older than 24 hours, mkdir moved inside try/except
- telegram.py: Add send_image_file() — sends local images via bot.send_photo()
- discord.py: Add send_image_file() — sends local images via discord.File
- slack.py: Add send_image_file() — sends local images via files_upload_v2()
(WhatsApp already had send_image_file — no changes needed)
- prompt_builder.py: Updated Telegram hint to list image extensions,
added Discord and Slack MEDIA: platform hints
- browser.md: Document screenshot sharing and 24h cleanup
- send_file_integration_map.md: Updated to reflect send_image_file is now
implemented on Telegram/Discord/Slack
- test_send_image_file.py: 19 tests covering MEDIA: .png extraction,
send_image_file on all platforms, and screenshot cleanup
Partially addresses #466 (Phase 0: platform adapter gaps for send_image_file).
Authored by christomitov. Auto-detects sk-kimi- key prefix and routes
to api.kimi.com/coding/v1. Adds User-Agent header for Kimi Code API
compatibility. Legacy Moonshot keys continue to work unchanged.
Critical fixes:
- Add --worktree/-w to hermes_cli/main.py argparse (both chat
subcommand and top-level parser) so 'hermes -w' works via the
actual CLI entry point, not just 'python cli.py -w'
- Pass worktree flag through cmd_chat() kwargs to cli_main()
- Handle worktree attr in bare 'hermes' and --resume/--continue paths
Bug fixes in cli.py:
- Skip worktree creation for --list-tools/--list-toolsets (wasteful)
- Wrap git worktree subprocess.run in try/except (crash on timeout)
- Add stale worktree pruning on startup (_prune_stale_worktrees):
removes clean worktrees older than 24h left by crashed/killed sessions
Documentation updates:
- AGENTS.md: add --worktree to CLI commands table
- cli-config.yaml.example: add worktree config section
- website/docs/reference/cli-commands.md: add to core commands
- website/docs/user-guide/cli.md: add usage examples
- website/docs/user-guide/configuration.md: add config docs
Test improvements (17 → 31 tests):
- Stale worktree pruning (prune old clean, keep recent, keep dirty)
- Directory symlink via .worktreeinclude
- Edge cases (no commits, not a repo, pre-existing .worktrees/)
- CLI flag/config OR logic
- TERMINAL_CWD integration
- System prompt injection format
Add a --worktree (-w) flag to the hermes CLI that creates an isolated
git worktree for the session. This allows running multiple hermes-agent
instances concurrently on the same repo without file collisions.
How it works:
- On startup with -w: detects git repo, creates .worktrees/<session>/
with its own branch (hermes/<session-id>), sets TERMINAL_CWD to it
- Each agent works in complete isolation — independent HEAD, index,
and working tree, shared git object store
- On exit: auto-removes worktree and branch if clean, warns and
keeps if there are uncommitted changes
- .worktreeinclude file support: list gitignored files (.env, .venv/)
to auto-copy/symlink into new worktrees
- .worktrees/ is auto-added to .gitignore
- Agent gets a system prompt note about the worktree context
- Config support: set worktree: true in config.yaml to always enable
Usage:
hermes -w # Interactive mode in worktree
hermes -w -q "Fix issue #123" # Single query in worktree
# Or in config.yaml:
worktree: true
Includes 17 tests covering: repo detection, worktree creation,
independence verification, cleanup (clean/dirty), .worktreeinclude,
.gitignore management, and 10 concurrent worktrees.
Closes#652
Long-lived gateway sessions can accumulate enough history that every new
message rehydrates an oversized transcript, causing repeated truncation
failures (finish_reason=length).
Add a session hygiene check in _handle_message that runs right after
loading the transcript and before invoking the agent:
1. Estimate message count and rough token count of the transcript
2. If above configurable thresholds (default: 200 msgs or 100K tokens),
auto-compress the transcript proactively
3. Notify the user about the compression with before/after stats
4. If still above warn threshold (default: 200K tokens) after
compression, suggest /reset
5. If compression fails on a dangerously large session, warn the user
to use /compress or /reset manually
Thresholds are configurable via config.yaml:
session_hygiene:
auto_compress_tokens: 100000
auto_compress_messages: 200
warn_tokens: 200000
This complements the agent's existing preflight compression (which
runs inside run_conversation) by catching pathological sessions at
the gateway layer before the agent is even created.
Includes 12 tests for threshold detection and token estimation.
Kimi Code (platform.kimi.ai) issues API keys prefixed sk-kimi- that require:
1. A different base URL: api.kimi.com/coding/v1 (not api.moonshot.ai/v1)
2. A User-Agent header identifying a recognized coding agent
Without this fix, sk-kimi- keys fail with 401 (wrong endpoint) or 403
('only available for Coding Agents') errors.
Changes:
- Auto-detect sk-kimi- key prefix and route to api.kimi.com/coding/v1
- Send User-Agent: KimiCLI/1.0 header for Kimi Code endpoints
- Legacy Moonshot keys (api.moonshot.ai) continue to work unchanged
- KIMI_BASE_URL env var override still takes priority over auto-detection
- Updated .env.example with correct docs and all endpoint options
- Fixed doctor.py health check for Kimi Code keys
Reference: https://github.com/MoonshotAI/kimi-cli (platforms.py)
Previously, when a session expired (idle/daily reset), the memory flush
ran synchronously inside get_or_create_session — blocking the user's
message for 10-60s while an LLM call saved memories.
Now a background watcher task (_session_expiry_watcher) runs every 5 min,
detects expired sessions, and flushes memories proactively in a thread
pool. By the time the user sends their next message, memories are
already saved and the response is immediate.
Changes:
- Add _is_session_expired(entry) to SessionStore — works from entry
alone without needing a SessionSource
- Add _pre_flushed_sessions set to track already-flushed sessions
- Remove sync _on_auto_reset callback from get_or_create_session
- Refactor flush into _flush_memories_for_session (sync worker) +
_async_flush_memories (thread pool wrapper)
- Add _session_expiry_watcher background task, started in start()
- Simplify /reset command to use shared fire-and-forget flush
- Add 10 tests for expiry detection, callback removal, tracking
Reduces token usage and latency for most tasks by defaulting to
medium reasoning effort instead of xhigh. Users can still override
via config or CLI flag. Updates code, tests, example config, and docs.
_convert_to_png() renamed the original file to .bmp before calling
ImageMagick convert, then unconditionally deleted the .bmp regardless
of whether convert succeeded. If convert failed, both files were gone.
- Only delete .bmp after confirmed successful conversion
- Restore original file on convert failure, timeout, or missing binary
- Add 3 tests covering failure, not-installed, and timeout scenarios
_make_cli() did not clear HERMES_MAX_ITERATIONS env var, so tests
failed in CI where the var was set externally. Also, default max_turns
changed from 60 to 90 in 0a82396 but tests were not updated.
- Clear HERMES_MAX_ITERATIONS in _make_cli() for proper isolation
- Add env_overrides parameter for tests that need specific env values
- Update hardcoded 60 assertions to 90 to match new default
- Simplify test_env_var_max_turns using env_overrides
When MarkdownV2 parsing fails, _strip_mdv2() removes escape backslashes
and bold markers (*text*) but missed italic markers (_text_). Users saw
raw underscores around italic text in the plaintext fallback.
- Add regex to strip _text_ italic markers in _strip_mdv2()
- Use word boundary lookaround to preserve snake_case identifiers
- Add tests for _strip_mdv2 covering italic, bold, snake_case, and edge cases
Add a 'platforms' field to SKILL.md frontmatter that restricts skills
to specific operating systems. Skills with platforms: [macos] only
appear in the system prompt, skills_list(), and slash commands on macOS.
Skills without the field load everywhere (backward compatible).
Implementation:
- skill_matches_platform() in tools/skills_tool.py — core filter
- Wired into all 3 discovery paths: prompt_builder.py, skills_tool.py,
skill_commands.py
- 28 new tests across 3 test files
New bundled Apple/macOS skills (all platforms: [macos]):
- imessage — Send/receive iMessages via imsg CLI
- apple-reminders — Manage Reminders via remindctl CLI
- apple-notes — Manage Notes via memo CLI
- findmy — Track devices/AirTags via AppleScript + screen capture
Docs updated: CONTRIBUTING.md, AGENTS.md, creating-skills.md,
skills.md (user guide)
Authored by areu01or00. Adds timezone support via hermes_time.now() helper
with IANA timezone resolution (HERMES_TIMEZONE env → config.yaml → server-local).
Updates system prompt timestamp, cron scheduling, and execute_code sandbox TZ
injection. Includes config migration (v4→v5) and comprehensive test coverage.
Introduces a new OpenClaw-to-Hermes migration skill with a Python
helper script that handles importing SOUL.md, memories, user profiles,
messaging settings, command allowlists, skills, TTS assets, and
workspace instructions.
Supports two migration presets (user-data / full), three skill conflict
modes (skip / overwrite / rename), overflow file export for entries that
exceed character limits, and granular include/exclude option filtering.
Includes detailed SKILL.md agent instructions covering the clarify-tool
interaction protocol, decision-to-command mapping, post-run reporting
rules, and path resolution guidance.
Adds dynamic panel width calculation to CLI clarify/approval widgets so
panels adapt to content and terminal size.
Includes 7 new tests covering presets, include/exclude, conflict modes,
overflow exports, and skills_guard integration.
Adds 4 new direct API-key providers (zai, kimi-coding, minimax, minimax-cn)
to the inference provider system. All use standard OpenAI-compatible
chat/completions endpoints with Bearer token auth.
Core changes:
- auth.py: Extended ProviderConfig with api_key_env_vars and base_url_env_var
fields. Added providers to PROVIDER_REGISTRY. Added provider aliases
(glm, z-ai, zhipu, kimi, moonshot). Added auto-detection of API-key
providers in resolve_provider(). Added resolve_api_key_provider_credentials()
and get_api_key_provider_status() helpers.
- runtime_provider.py: Added generic API-key provider branch in
resolve_runtime_provider() — any provider with auth_type='api_key'
is automatically handled.
- main.py: Added providers to hermes model menu with generic
_model_flow_api_key_provider() flow. Updated _has_any_provider_configured()
to check all provider env vars. Updated argparse --provider choices.
- setup.py: Added providers to setup wizard with API key prompts and
curated model lists.
- config.py: Added env vars (GLM_API_KEY, KIMI_API_KEY, MINIMAX_API_KEY,
etc.) to OPTIONAL_ENV_VARS.
- status.py: Added API key display and provider status section.
- doctor.py: Added connectivity checks for each provider endpoint.
- cli.py: Updated provider docstrings.
Docs: Updated README.md, .env.example, cli-config.yaml.example,
cli-commands.md, environment-variables.md, configuration.md.
Tests: 50 new tests covering registry, aliases, resolution, auto-detection,
credential resolution, and runtime provider dispatch.
Inspired by PR #33 (numman-ali) which proposed a provider registry approach.
Credit to tars90percent (PR #473) and manuelschipper (PR #420) for related
provider improvements merged earlier in this changeset.
Two bugs fixed:
1. search_messages() crashes with OperationalError when user queries
contain FTS5 special characters (+, ", (, {, dangling AND/OR, etc).
Added _sanitize_fts5_query() to strip dangerous operators and a
fallback try-except for edge cases.
2. _append_to_sqlite() in mirror.py creates a new SessionDB per call
but never closes it, leaking SQLite connections. Added finally block
to ensure db.close() is always called.
API key selection is now base_url-aware: when the resolved base_url
targets OpenRouter, OPENROUTER_API_KEY takes priority (preserving the
#289 fix). When hitting any other endpoint (Z.ai, vLLM, custom, etc.),
OPENAI_API_KEY takes priority so the OpenRouter key doesn't leak.
Applied in both the runtime provider resolver (the real code path) and
the CLI initial default (for consistency).
Fixes#560.
_make_cli() now patches CLI_CONFIG with clean defaults so
test_cli_init tests don't depend on the developer's local config.yaml.
test_empty_dir_returns_empty now mocks Path.home() so it doesn't pick
up a global SOUL.md.
Credit to teyrebaz33 for identifying and fixing these in PR #557.
Fixes#555.
tool_call_count was inaccurate in two ways:
1. Under-counting: an assistant message with N parallel tool calls
(e.g. "kill the light and shut off the fan" = 2 ha_call_service)
only incremented tool_call_count by 1 instead of N.
2. Over-counting: tool response messages (role=tool) also incremented
tool_call_count, double-counting every tool interaction.
Combined: 2 parallel tool calls produced tool_call_count=3 (1 from
assistant + 2 from tool responses) instead of the correct value of 2.
Fix: only count from assistant messages with tool_calls, incrementing
by len(tool_calls) to handle parallel calls correctly. Tool response
messages no longer affect tool_call_count.
This impacts /insights and /usage accuracy for sessions with tool use.
Two bugs in sync_skills():
1. Failed copytree poisons manifest: when shutil.copytree fails (disk
full, permission error), the skill is still recorded in the manifest.
On the next sync, the skill appears as "in manifest but not on disk"
which is interpreted as "user deliberately deleted it" — the skill
is never retried. Fix: only write to manifest on successful copy.
2. Failed update destroys user copy: rmtree deletes the existing skill
directory before copytree runs. If copytree then fails, the user's
skill is gone with no way to recover. Fix: move to .bak before
copying, restore from backup if copytree fails.
Both bugs are proven by new regression tests that fail on the old code
and pass on the fix.
Upgrade skills_sync manifest to v2 format (name:origin_hash). The origin
hash records the MD5 of the bundled skill at the time it was last synced.
On update, the user's copy is compared against the origin hash:
- User copy == origin hash → unmodified → safe to update from bundled
- User copy != origin hash → user customized → skip (preserve changes)
v1 manifests (plain names) are auto-migrated: the user's current hash
becomes the baseline, so future syncs can detect modifications.
Output now shows user-modified skills:
~ whisper (user-modified, skipping)
27 tests covering all scenarios including v1→v2 migration, user
modification detection, update after migration, and origin hash tracking.
2009 tests pass.
- Restored 21 skills removed in commits 757d012 and 740dd92:
accelerate, audiocraft, code-review, faiss, flash-attention, gguf,
grpo-rl-training, guidance, llava, nemo-curator, obliteratus, peft,
pytorch-fsdp, pytorch-lightning, simpo, slime, stable-diffusion,
tensorrt-llm, torchtitan, trl-fine-tuning, whisper
- Rewrote sync_skills() with proper update semantics:
* New skills (not in manifest): copied to user dir
* Existing skills (in manifest + on disk): updated via hash comparison
* User-deleted skills (in manifest, not on disk): respected, not re-added
* Stale manifest entries (removed from bundled): cleaned from manifest
- Added sync_skills() to CLI startup (cmd_chat) and gateway startup
(start_gateway) — previously only ran during 'hermes update'
- Updated cmd_update output to show new/updated/cleaned counts
- Rewrote tests: 20 tests covering manifest CRUD, dir hashing, fresh
install, user deletion respect, update detection, stale cleanup, and
name collision handling
75 bundled skills total. 2002 tests pass.
Issues found and fixed during deep code path review:
1. CRITICAL: Prefix matching returned wrong prices for dated model names
- 'gpt-4o-mini-2024-07-18' matched gpt-4o ($2.50) instead of gpt-4o-mini ($0.15)
- Same for o3-mini→o3 (9x), gpt-4.1-mini→gpt-4.1 (5x), gpt-4.1-nano→gpt-4.1 (20x)
- Fix: use longest-match-wins strategy instead of first-match
- Removed dangerous key.startswith(bare) reverse matching
2. CRITICAL: Top Tools section was empty for CLI sessions
- run_agent.py doesn't set tool_name on tool response messages (pre-existing)
- Insights now also extracts tool names from tool_calls JSON on assistant
messages, which IS populated for all sessions
- Uses max() merge strategy to avoid double-counting between sources
3. SELECT * replaced with explicit column list
- Skips system_prompt and model_config blobs (can be thousands of chars)
- Reduces memory and I/O for large session counts
4. Sets in overview dict converted to sorted lists
- models_with_pricing / models_without_pricing were Python sets
- Sets aren't JSON-serializable — would crash json.dumps()
5. Negative duration guard
- end > start check prevents negative durations from clock drift
6. Model breakdown sort fallback
- When all tokens are 0, now sorts by session count instead of arbitrary order
7. Removed unused timedelta import
Added 6 new tests: dated model pricing (4), tool_calls JSON extraction,
JSON serialization safety. Total: 69 tests.
Custom OAI endpoints, self-hosted models, and local inference should NOT
show fabricated cost estimates. Changed default pricing from $3/$12 per
million tokens to $0/$0 for unrecognized models.
- Added _has_known_pricing() to distinguish commercial vs custom models
- Models with known pricing show $ amounts; unknown models show 'N/A'
- Overview shows asterisk + note when some models lack pricing data
- Gateway format adds '(excludes custom/self-hosted models)' note
- Added 7 new tests for custom model cost handling
Inspired by Claude Code's /insights, adapted for Hermes Agent's multi-platform
architecture. Analyzes session history from state.db to produce comprehensive
usage insights.
Features:
- Overview stats: sessions, messages, tokens, estimated cost, active time
- Model breakdown: per-model sessions, tokens, and cost estimation
- Platform breakdown: CLI vs Telegram vs Discord etc. (unique to Hermes)
- Tool usage ranking: most-used tools with percentages
- Activity patterns: day-of-week chart, peak hours, streaks
- Notable sessions: longest, most messages, most tokens, most tool calls
- Cost estimation: real pricing data for 25+ models (OpenAI, Anthropic,
DeepSeek, Google, Meta) with fuzzy model name matching
- Configurable time window: --days flag (default 30)
- Source filtering: --source flag to filter by platform
Three entry points:
- /insights slash command in CLI (supports --days and --source flags)
- /insights slash command in gateway (compact markdown format)
- hermes insights CLI subcommand (standalone)
Includes 56 tests covering pricing helpers, format helpers, empty DB,
populated DB with multi-platform data, filtering, formatting, and edge cases.
Authored by Farukest. Fixes#432. Extracts _kill_port_process() helper
that uses netstat+taskkill on Windows and fuser on Linux. Previously,
fuser calls were inline with bare except-pass, so on Windows orphaned
bridge processes were never cleaned up — causing 'address already in use'
errors on reconnect. Includes 5 tests covering both platforms, port
matching edge cases, and exception suppression.
Authored by Farukest. Fixes#435. The retry summary in
_handle_max_iterations() hardcoded max_tokens instead of using
_max_tokens_param(), which returns max_completion_tokens for direct
OpenAI API (required by gpt-4o, o-series). The first attempt already
used _max_tokens_param correctly — only the retry path was wrong.
Includes 4 tests for _max_tokens_param provider detection.
Verifies explicit allowlist keys, catch-all _API_KEY/_TOKEN patterns,
case insensitivity, TERMINAL_SSH prefix, and config.yaml routing for
non-secret keys. Covers the fix from PR #469.
The mock handler checked for function_name == 'search' but the RPC
sends 'search_files'. Any test exercising search_files through the
mock would get 'Unknown tool' instead of the canned response.
The _TOOL_STUBS dict in code_execution_tool.py was out of sync with the
actual tool schemas, causing TypeErrors when the LLM used parameters it
sees in its system prompt but the sandbox stubs didn't accept:
search_files:
- Added missing params: context, offset, output_mode
- Fixed target default: 'grep' → 'content' (old value was obsolete)
patch:
- Added missing params: mode, patch (V4A multi-file patch support)
Also added 4 drift-detection tests (TestStubSchemaDrift) that will
catch future divergence between stubs and real schemas:
- test_stubs_cover_all_schema_params: every schema param in stub
- test_stubs_pass_all_params_to_rpc: every stub param sent over RPC
- test_search_files_target_uses_current_values: no obsolete values
- test_generated_module_accepts_all_params: generated code compiles
All 28 tests pass.
Authored by rovle. Adds Daytona as the sixth terminal execution backend
with cloud sandboxes, persistent workspaces, and full CLI/gateway integration.
Includes 24 unit tests and 8 integration tests.
The execute_code sandbox generates a hermes_tools.py stub module for LLM
scripts. Three common failure modes keep tripping up scripts:
1. json.loads(strict=True) rejects control chars in terminal() output
(e.g., GitHub issue bodies with literal tabs/newlines)
2. Shell backtick/quote interpretation when interpolating dynamic content
into terminal() commands (markdown with backticks gets eaten by bash)
3. No retry logic for transient network failures (API timeouts, rate limits)
Adds three convenience helpers to the generated hermes_tools module:
- json_parse(text) — json.loads with strict=False for tolerant parsing
- shell_quote(s) — shlex.quote() for safe shell interpolation
- retry(fn, max_attempts=3, delay=2) — exponential backoff wrapper
Also updates the EXECUTE_CODE_SCHEMA description to document these helpers
so LLMs know they're available without importing anything extra.
Includes 7 new tests (unit + integration) covering all three helpers.
The original implementation only supported xclip (X11), which silently
fails on WSL2 (can't access Windows clipboard for images), Wayland
desktops (xclip is X11-only), and VSCode terminal on WSL2.
Clipboard backend changes (hermes_cli/clipboard.py):
- WSL2: detect via /proc/version, use powershell.exe with .NET
System.Windows.Forms.Clipboard to extract images as base64 PNG
- Wayland: use wl-paste with MIME type detection, auto-convert BMP
to PNG for WSLg environments (via Pillow or ImageMagick)
- Dispatch order: WSL → Wayland → X11 (xclip), with fallthrough
- New has_clipboard_image() for lightweight clipboard checks
- Cache WSL detection result per-process
CLI changes (cli.py):
- /paste command: explicit clipboard image check for terminals where
BracketedPaste doesn't fire (image-only clipboard in VSCode/WinTerm)
- Ctrl+V keybinding: fallback for Linux terminals where Ctrl+V sends
raw byte instead of triggering bracketed paste
Tests: 80 tests (up from 37) covering WSL, Wayland, X11 dispatch,
BMP conversion, has_clipboard_image, and /paste command.