#4b1567f4 (anthhub) added qrcode to the messaging extra for Weixin's
QR login. The same package is needed by:
* hermes_cli/dingtalk_auth.py — QR device-flow auth shipped in #11574
* gateway/platforms/feishu.py:3962 — Feishu QR login
These extras are independent of [messaging] (users can install
hermes-agent[dingtalk] or hermes-agent[feishu] without [messaging]),
so the dep needs to be declared on each.
Pin matches anthhub's choice (>=7.0,<8) for consistency. The all
extra inherits from all three, so it picks up qrcode transitively.
Adds parallel tests to tests/test_project_metadata.py — same shape
as test_messaging_extra_includes_qrcode_for_weixin_setup.
Refs #9431.
Byte-level reasoning models (xiaomi/mimo-v2-pro, kimi, glm) can emit lone
surrogates in reasoning output. The proactive sanitizer walked content/
name/tool_calls but not extra fields like reasoning or the nested
reasoning_details array. Surrogates in those fields survived the
proactive pass, crashed json.dumps() in the OpenAI SDK, and the recovery
block's _sanitize_messages_surrogates(messages) call also didn't check
those fields — so 'found' was False, no retry happened, and after 3
attempts the user saw:
API call failed after 3 retries. 'utf-8' codec can't encode characters
in position N-M: surrogates not allowed
Changes:
- _sanitize_messages_surrogates: walk any extra string fields (reasoning,
reasoning_content, etc.) and recurse into nested dict/list values
(reasoning_details). Mirrors _sanitize_messages_non_ascii coverage
added in PR #10537.
- _sanitize_structure_surrogates: new recursive walker, mirror of
_sanitize_structure_non_ascii but for surrogate recovery.
- UnicodeEncodeError recovery block: also sanitize api_messages,
api_kwargs, and prefill_messages (not just the canonical messages
list — the API-copy carries reasoning_content transformed from
reasoning and that's what the SDK actually serializes). Always
retry on detected surrogate errors, not only when we found
something to strip — gate on error type per PR #10537's pattern.
Tests: extended tests/cli/test_surrogate_sanitization.py with
coverage for reasoning, reasoning_content, reasoning_details (flat
and deeply nested), structure walker, and an integration case that
reproduces the exact api_messages shape that was crashing.
* fix(tests): make AIAgent constructor calls self-contained (no env leakage)
Tests in tests/run_agent/ were constructing AIAgent() without passing
both api_key and base_url, then relying on leaked state from other
tests in the same xdist worker (or process-level env vars) to keep
provider resolution happy. Under hermetic conftest + pytest-split,
that state is gone and the tests fail with 'No LLM provider configured'.
Fix: pass both api_key and base_url explicitly on 47 AIAgent()
construction sites across 13 files. AIAgent.__init__ with both set
takes the direct-construction path (line 960 in run_agent.py) and
skips the resolver entirely.
One call site (test_none_base_url_passed_as_none) left alone — that
test asserts behavior for base_url=None specifically.
This is a prerequisite for any future matrix-split or stricter
isolation work, and lands cleanly on its own.
Validation:
- tests/run_agent/ full: 760 passed, 0 failed (local)
- Previously relied on cross-test pollution; now self-contained
* fix(tests): update opencode-go model order assertion to match kimi-k2.5-first
commit 78a74bb promoted kimi-k2.5 to first position in model suggestion
lists but didn't update this test, which has been failing on main since.
Reorder expected list to match the new canonical order.
- tui_gateway: new `setup.status` RPC that reuses CLI's
`_has_any_provider_configured()`, so the TUI can ask the same question
the CLI bootstrap asks before launching a session
- useSessionLifecycle: preflight `setup.status` before both `newSession`
and `resumeById`, and render a clear "Setup Required" panel when no
provider is configured instead of booting a session that immediately
fails with `agent init failed`
- createGatewayEventHandler: drop duplicate startup resume logic in
favor of the preflighted `resumeById`, and special-case the
no-provider agent-init error as a last-mile fallback to the same
setup panel
- add regression tests for both paths
- tui_gateway: route approvals through gateway callback (HERMES_GATEWAY_SESSION/
HERMES_EXEC_ASK) so dangerous commands emit approval.request instead of
silently falling through the CLI input() path and auto-denying
- approval UX: dedicated PromptZone between transcript and composer, safer
defaults (sel=0, numeric quick-picks, no Esc=deny), activity trail line,
outcome footer under the cost row
- text input: Ctrl+A select-all, real forward Delete, Ctrl+W always consumed
(fixes Ctrl+Backspace at cursor 0 inserting literal w)
- hermes-ink selection: swap synchronous onRender() for throttled
scheduleRender() on drag, and only notify React subscribers on presence
change — no more per-cell paint/subscribe spam
- useConfigSync: silence config.get polling failures instead of surfacing
'error: timeout: config.get' in the transcript
Previously a message like `<@&1490963422786093149> help` would spawn a
thread literally named `<@&1490963422786093149> help`, exposing raw
Discord mention markers in the thread list. Only user mentions
(`<@id>`) were being stripped upstream — role mentions (`<@&id>`) and
channel mentions (`<#id>`) leaked through.
Fix: strip all three mention patterns in `_auto_create_thread` before
building the thread name. Collapse runs of whitespace left by the
removal. If the entire content was mention-only, fall back to 'Hermes'
instead of an empty title.
Fixes#6336.
Tests: two new regression guards in test_discord_slash_commands.py
covering mixed-mention content and mention-only content.
Free-response channels already bypassed the @mention gate so users could
chat inline with the bot, but auto-threading still fired on every
message — spinning off a thread per message and defeating the
lightweight-chat purpose.
Fix: fold `is_free_channel` into `skip_thread` so threading is skipped
whenever the channel is in DISCORD_FREE_RESPONSE_CHANNELS (via env or
discord.free_response_channels in config.yaml).
Net change: one line in _handle_message + one regression test.
Partially addresses #9399. Authored by @Hypn0sis (salvaged from PR #9650;
the bundled 'smart' auto-thread mode from that PR was dropped in favor
of deterministic true/false semantics).
* fix(gateway): bound _agent_cache with LRU cap + idle TTL eviction
The per-session AIAgent cache was unbounded. Each cached AIAgent holds
LLM clients, tool schemas, memory providers, and a conversation buffer.
In a long-lived gateway serving many chats/threads, cached agents
accumulated indefinitely — entries were only evicted on /new, /model,
or session reset.
Changes:
- Cache is now an OrderedDict so we can pop least-recently-used entries.
- _enforce_agent_cache_cap() pops entries beyond _AGENT_CACHE_MAX_SIZE=64
when a new agent is inserted. LRU order is refreshed via move_to_end()
on cache hits.
- _sweep_idle_cached_agents() evicts entries whose AIAgent has been idle
longer than _AGENT_CACHE_IDLE_TTL_SECS=3600s. Runs from the existing
_session_expiry_watcher so no new background task is created.
- The expiry watcher now also pops the cache entry after calling
_cleanup_agent_resources on a flushed session — previously the agent
was shut down but its reference stayed in the cache dict.
- Evicted agents have _cleanup_agent_resources() called on a daemon
thread so the cache lock isn't held during slow teardown.
Both tuning constants live at module scope so tests can monkeypatch
them without touching class state.
Tests: 7 new cases in test_agent_cache.py covering LRU eviction,
move_to_end refresh, cleanup thread dispatch, idle TTL sweep,
defensive handling of agents without _last_activity_ts, and plain-dict
test fixture tolerance.
* tweak: bump _AGENT_CACHE_MAX_SIZE 64 -> 128
* fix(gateway): never evict mid-turn agents; live spillover tests
The prior commit could tear down an active agent if its session_key
happened to be LRU when the cap was exceeded. AIAgent.close() kills
process_registry entries for the task, tears down the terminal
sandbox, closes the OpenAI client (sets self.client = None), and
cascades .close() into any active child subagents — all fatal if
the agent is still processing a turn.
Changes:
- _enforce_agent_cache_cap and _sweep_idle_cached_agents now look at
GatewayRunner._running_agents and skip any entry whose AIAgent
instance is present (identity via id(), so MagicMock doesn't
confuse lookup in tests). _AGENT_PENDING_SENTINEL is treated
as 'not active' since no real agent exists yet.
- Eviction only considers the LRU-excess window (first size-cap
entries). If an excess slot is held by a mid-turn agent, we skip
it WITHOUT compensating by evicting a newer entry. A freshly
inserted session (zero cache history) shouldn't be punished to
protect a long-lived one that happens to be busy.
- Cache may therefore stay transiently over cap when load spikes;
a WARNING is logged so operators can see it, and the next insert
re-runs the check after some turns have finished.
New tests (TestAgentCacheActiveSafety + TestAgentCacheSpilloverLive):
- Active LRU entry is skipped; no newer entry compensated
- Mixed active/idle excess window: only idle slots go
- All-active cache: no eviction, WARNING logged, all clients intact
- _AGENT_PENDING_SENTINEL doesn't block other evictions
- Idle-TTL sweep skips active agents
- End-to-end: active agent's .client survives eviction attempt
- Live fill-to-cap with real AIAgents, then spillover
- Live: CAP=4 all active + 1 newcomer — cache grows to 5, no teardown
- Live: 8 threads racing 160 inserts into CAP=16 — settles at 16
- Live: evicted session's next turn gets a fresh agent that works
30 tests pass (13 pre-existing + 17 new). Related gateway suites
(model switch, session reset, proxy, etc.) all green.
* fix(gateway): cache eviction preserves per-task state for session resume
The prior commits called AIAgent.close() on cache-evicted agents, which
tears down process_registry entries, terminal sandbox, and browser
daemon for that task_id — permanently. Fine for session-expiry (session
ended), wrong for cache eviction (session may resume).
Real-world scenario: a user leaves a Telegram session open for 2+ hours,
idle TTL evicts the cached AIAgent, user returns and sends a message.
Conversation history is preserved via SessionStore, but their terminal
sandbox (cwd, env vars, bg shells) and browser state were destroyed.
Fix: split the two cleanup modes.
close() Full teardown — session ended. Kills bg procs,
tears down terminal sandbox + browser daemon,
closes LLM client. Used by session-expiry,
/new, /reset (unchanged).
release_clients() Soft cleanup — session may resume. Closes
LLM client only. Leaves process_registry,
terminal sandbox, browser daemon intact
for the resuming agent to inherit via
shared task_id.
Gateway cache eviction (_enforce_agent_cache_cap, _sweep_idle_cached_agents)
now dispatches _release_evicted_agent_soft on the daemon thread instead
of _cleanup_agent_resources. All session-expiry call sites of
_cleanup_agent_resources are unchanged.
Tests (TestAgentCacheIdleResume, 5 new cases):
- release_clients does NOT call process_registry.kill_all
- release_clients does NOT call cleanup_vm / cleanup_browser
- release_clients DOES close the LLM client (agent.client is None after)
- close() vs release_clients() — semantic contract pinned
- Idle-evicted session's rebuild with same session_id gets same task_id
Updated test_cap_triggers_cleanup_thread to assert the soft path fires
and the hard path does NOT.
35 tests pass in test_agent_cache.py; 67 related tests green.
Match the row-budget naming introduced in PR #11260 for the approval and
clarify panels: rename chrome_reserve=14 into reserved_below=6 (input
chrome below the panel) + panel_chrome=6 (this panel's borders, blanks,
and hint row) + min_visible=3 (floor on visible items). Same arithmetic
as before, but a reviewer reading both files now sees the same handle.
Compact-chrome mode is intentionally not adopted — that pattern fits the
"fixed mandatory content might overflow" shape of approval/clarify
(solved by truncating with a marker), whereas the picker's overflow is
already handled by the scrolling viewport.
The /model picker rendered every choice into a prompt_toolkit Window
with no max height. Providers with many models (e.g. Ollama Cloud's 36+)
overflowed the terminal, clipping the bottom border and the last items.
- Add HermesCLI._compute_model_picker_viewport() to slide a scroll
offset that keeps the cursor on screen, sized from the live terminal
rows minus chrome reserved for input/status/border.
- Render only the visible slice in _get_model_picker_display() and
persist the offset on _model_picker_state across redraws.
- Bind ESC (eager) to close the picker, matching the Cancel button.
- Cover the viewport math with 8 unit tests in
tests/hermes_cli/test_model_picker_viewport.py.
Cron origin fallback extension (builds on #9193's _HOME_TARGET_ENV_VARS):
adds the three remaining origin-fallback-eligible platforms that have
home channel env vars configured in gateway/config.py but use non-generic
env var names:
- email → EMAIL_HOME_ADDRESS (non-standard suffix)
- dingtalk → DINGTALK_HOME_CHANNEL
- qqbot → QQ_HOME_CHANNEL (non-standard prefix: QQ_ not QQBOT_)
Picks up the completeness intent of @Xowiek's PR #11317 using the
architecturally-correct dict-based lookup from #9193, so platforms with
non-standard env var names actually resolve instead of silently missing.
Extended the parametrized regression test to cover the new three.
Weixin test mock alignment (builds on #10091's _send_session split):
Three test sites added in Batch 1 (TestWeixinSendImageFileParameterName)
and Batch 3 (TestWeixinVoiceSending) mocked only adapter._session, but
#10091 switched the send paths to check self._send_session. Added the
companion setter so the tests stay green with the session split in place.
- gateway/platforms/weixin.py:
- Split aiohttp.ClientSession into _poll_session and _send_session
- Add _LIVE_ADAPTERS registry so send_weixin_direct() reuses the connected gateway adapter instead of creating a competing session
- Fixes silent message loss when gateway is running (iLink token contention)
- cron/scheduler.py:
- Support comma-separated deliver values (e.g. 'feishu,weixin') for multi-target delivery
- Delay pconfig/enabled check until standalone fallback so live adapters work even when platform is not in gateway config
- tools/send_message_tool.py:
- Synthesize PlatformConfig from WEIXIN_* env vars when gateway config lacks a weixin entry
- Fall back to WEIXIN_HOME_CHANNEL env var for home channel resolution
- tests/gateway/test_weixin.py:
- Update mocks to include _send_session
Follow-ups to the salvaged commits in this PR:
* gateway/config.py — strip trailing whitespace from youngDoo's diff
(line 315 had ~140 trailing spaces).
* hermes_cli/tools_config.py — replace `config.get("platform_toolsets", {})`
with `config.get("platform_toolsets") or {}`. Handles the case where the
YAML key is present but explicitly null (parses as None, previously
crashed with AttributeError on the next line's .get(platform)).
Cherry-picked from yyq4193's #9003 with attribution.
* tests/gateway/test_config.py — 4 new tests for TestGetConnectedPlatforms
covering DingTalk via extras, via env vars, disabled, and missing creds.
* tests/hermes_cli/test_tools_config.py — regression test for the null
platform_toolsets edge case.
* scripts/release.py — add kagura-agent, youngDoo, yyq4193 to AUTHOR_MAP.
Co-authored-by: yyq4193 <39405770+yyq4193@users.noreply.github.com>
Fixes#11463: DingTalk channel receives messages but fails to reply
with 'No session_webhook available'.
Two changes:
1. **Fire-and-forget message processing**: process() now dispatches
_on_message as a background task via asyncio.create_task instead of
awaiting it. This ensures the SDK ACK is returned immediately,
preventing heartbeat timeouts and disconnections when message
processing takes longer than the SDK's ACK deadline.
2. **session_webhook extraction fallback**: If ChatbotMessage.from_dict()
fails to map the sessionWebhook field (possible across SDK versions),
the handler now falls back to extracting it directly from the raw
callback data dict using both 'sessionWebhook' and 'session_webhook'
key variants.
Added 3 tests covering webhook extraction, fallback behavior, and
fire-and-forget ACK timing.
* test: make test env hermetic; enforce CI parity via scripts/run_tests.sh
Fixes the recurring 'works locally, fails in CI' (and vice versa) class
of flakes by making tests hermetic and providing a canonical local runner
that matches CI's environment.
## Layer 1 — hermetic conftest.py (tests/conftest.py)
Autouse fixture now unsets every credential-shaped env var before every
test, so developer-local API keys can't leak into tests that assert
'auto-detect provider when key present'.
Pattern: unset any var ending in _API_KEY, _TOKEN, _SECRET, _PASSWORD,
_CREDENTIALS, _ACCESS_KEY, _PRIVATE_KEY, etc. Plus an explicit list of
credential names that don't fit the suffix pattern (AWS_ACCESS_KEY_ID,
FAL_KEY, GH_TOKEN, etc.) and all the provider BASE_URL overrides that
change auto-detect behavior.
Also unsets HERMES_* behavioral vars (HERMES_YOLO_MODE, HERMES_QUIET,
HERMES_SESSION_*, etc.) that mutate agent behavior.
Also:
- Redirects HOME to a per-test tempdir (not just HERMES_HOME), so
code reading ~/.hermes/* directly can't touch the real dir.
- Pins TZ=UTC, LANG=C.UTF-8, LC_ALL=C.UTF-8, PYTHONHASHSEED=0 to
match CI's deterministic runtime.
The old _isolate_hermes_home fixture name is preserved as an alias so
any test that yields it explicitly still works.
## Layer 2 — scripts/run_tests.sh canonical runner
'Always use scripts/run_tests.sh, never call pytest directly' is the
new rule (documented in AGENTS.md). The script:
- Unsets all credential env vars (belt-and-suspenders for callers
who bypass conftest — e.g. IDE integrations)
- Pins TZ/LANG/PYTHONHASHSEED
- Uses -n 4 xdist workers (matches GHA ubuntu-latest; -n auto on
a 20-core workstation surfaces test-ordering flakes CI will never
see, causing the infamous 'passes in CI, fails locally' drift)
- Finds the venv in .venv, venv, or main checkout's venv
- Passes through arbitrary pytest args
Installs pytest-split on demand so the script can also be used to run
matrix-split subsets locally for debugging.
## Remove 3 module-level dotenv stubs that broke test isolation
tests/hermes_cli/test_{arcee,xiaomi,api_key}_provider.py each had a
module-level:
if 'dotenv' not in sys.modules:
fake_dotenv = types.ModuleType('dotenv')
fake_dotenv.load_dotenv = lambda *a, **kw: None
sys.modules['dotenv'] = fake_dotenv
This patches sys.modules['dotenv'] to a fake at import time with no
teardown. Under pytest-xdist LoadScheduling, whichever worker collected
one of these files first poisoned its sys.modules; subsequent tests in
the same worker that imported load_dotenv transitively (e.g.
test_env_loader.py via hermes_cli.env_loader) got the no-op lambda and
saw their assertions fail.
dotenv is a required dependency (python-dotenv>=1.2.1 in pyproject.toml),
so the defensive stub was never needed. Removed.
## Validation
- tests/hermes_cli/ alone: 2178 passed, 1 skipped, 0 failed (was 4
failures in test_env_loader.py before this fix)
- tests/test_plugin_skills.py, tests/hermes_cli/test_plugins.py,
tests/test_hermes_logging.py combined: 123 passed (the caplog
regression tests from PR #11453 still pass)
- Local full run shows no F/E clusters in the 0-55% range that were
previously present before the conftest hardening
## Background
See AGENTS.md 'Testing' section for the full list of drift sources
this closes. Matrix split (closed as #11566) will be re-attempted
once this foundation lands — cross-test pollution was the root cause
of the shard-3 hang in that PR.
* fix(conftest): don't redirect HOME — it broke CI subprocesses
PR #11577's autouse fixture was setting HOME to a per-test tempdir.
CI started timing out at 97% complete with dozens of E/F markers and
orphan python processes at cleanup — tests (or transitive deps)
spawn subprocesses that expect a stable HOME, and the redirect broke
them in non-obvious ways.
Env-var unsetting and TZ/LANG/hashseed pinning (the actual CI-drift
fixes) are unchanged and still in place. HERMES_HOME redirection is
also unchanged — that's the canonical way to isolate tests from
~/.hermes/, not HOME.
Any code in the codebase reading ~/.hermes/* via `Path.home() / ".hermes"`
instead of `get_hermes_home()` is a bug to fix at the callsite, not
something to paper over in conftest.
Two follow-ups to the cherry-picked PR #9873 (`e3bcc819`):
1. `_is_allowed_user` now uses `getattr(self, '_allowed_*_ids', set())`
so test fixtures that build the adapter via `object.__new__`
(skipping __init__) don't crash with AttributeError.
See AGENTS.md pitfall #17 — same pattern as gateway.run.
2. New 3-case regression coverage in test_discord_bot_auth_bypass.py:
- role-only config bypasses the gateway 'no allowlists' branch
- roles + users combined still authorizes user-allowlist matches
- the role bypass does NOT leak to other platforms (Telegram, etc.)
3. Autouse fixture in test_discord_bot_auth_bypass.py clears all Discord
auth env vars before each test so DISCORD_ALLOWED_ROLES leakage from
a previous test in the session can't flip later 'should-reject' tests
into false-pass.
Required because the bare cherry-pick of #9873 only added the adapter-
level role check — it didn't cover the gateway-level _is_user_authorized,
which still rejected role-only setups via the 'no allowlists configured'
branch.
Six test cases covering:
- DISCORD_ALLOW_BOTS=mentions + bot not in DISCORD_ALLOWED_USERS → authorized
- DISCORD_ALLOW_BOTS=all + bot not in DISCORD_ALLOWED_USERS → authorized
- DISCORD_ALLOW_BOTS=none → bots still rejected (preserves security)
- DISCORD_ALLOW_BOTS unset → same as 'none'
- Humans still checked against allowlist even with allow_bots=all
- Bot bypass is Discord-specific — doesn't leak to other platforms
Guards against a regression where the is_bot bypass in _is_user_authorized
gets moved, removed, or accidentally extended to other platforms.
Closes#11321, closes#10259.
## Problem
The nested /skill command group (category subcommand groups + skill
subcommands) serialized to ~14KB with the default 75-skill catalog,
exceeding Discord's ~8000-byte per-command registration payload. The
entire tree.sync() rejected with error 50035 — ALL slash commands
including the 27 base commands failed to register.
## Fix
Replace the nested Group layout with a single flat Command:
/skill name:<autocomplete> args:<optional string>
Autocomplete options are fetched dynamically by Discord when the user
types — they do NOT count against the per-command registration budget.
So this single command registers at ~200 bytes regardless of how many
skills exist. Scales to thousands of skills with no size calculations,
no splitting, no hidden skills.
UX improvements:
- Discord live-filters by user's typed prefix against BOTH name and
description, so '/skill pdf' finds 'ocr-and-documents' via its
description. More discoverable than clicking through category menus.
- Unknown skill name → ephemeral error pointing user at autocomplete.
- Stable alphabetical ordering across restarts.
## Why not the other proposed approaches
Three prior PRs tried to fit within the 8KB limit by modifying the
nested layout:
- #10214 (njiangk): truncated all descriptions to 'Run <name>' and
category descriptions to 'Skills'. Works but destroys slash picker UX.
- #11385 (LeonSGP43): 40-char description clamp + iterative
trim-largest-category fallback. Works but HIDES skills the user can
no longer invoke via slash — functional regression.
- #10261 (zeapsu): adaptive split into /skill-<cat> top-level groups.
Preserves all skills but pollutes the slash namespace with 20
top-level commands.
All three work around the symptom. The flat autocomplete design
dissolves the problem — there is no payload-size pressure to manage.
## Tests
tests/gateway/test_discord_slash_commands.py — 5 new test cases replace
the 3 old nested-structure tests:
- flat-not-nested structure assertion
- empty skills → no command registered
- callback dispatches the right cmd_key by name
- unknown name → ephemeral error, no dispatch
- large-catalog regression guard (500 skills) — command payload stays
under 500 bytes regardless
E2E validated against real discord.py 2.7.1:
- Command registers as discord.app_commands.Command (not Group).
- Autocomplete filters by name AND description (verified across several
queries including description-only matches like 'pdf' → OCR skill).
- 500-skill catalog returns max 25 results per autocomplete query
(Discord's hard cap), filtered correctly.
- Choice labels formatted as 'name — description' clamped to 100 chars.
Adds 15 regression tests for hermes_cli/dingtalk_auth.py covering:
* _api_post — network error mapping, errcode-nonzero mapping, success path
* begin_registration — 2-step chain, missing-nonce/device_code/uri
error cases
* wait_for_registration_success — success path, missing-creds guard,
on_waiting callback invocation
* render_qr_to_terminal — returns False when qrcode missing, prints
when available
* Configuration — BASE_URL default + override, SOURCE default
Also adds a one-line disclosure in dingtalk_qr_auth() telling users
the scan page will be OpenClaw-branded. Interim measure: DingTalk's
registration portal is hardcoded to route all sources to /openapp/
registration/openClaw, so users see OpenClaw branding regardless of
what 'source' value we send. We keep 'openClaw' as the source token
until DingTalk-Real-AI registers a Hermes-specific template.
Also adds meng93 to scripts/release.py AUTHOR_MAP.
- stop rewriting markdown tables, headings, and links before delivery
- keep markdown table blocks and headings together during chunking
- update Weixin tests and docs for native markdown rendering
Closes#10308
Three open issues — #8242, #6587, #11345 — all trace to the same root
cause: the image / audio / document download paths in
`DiscordAdapter._handle_message` used plain, unauthenticated HTTP to
fetch `att.url`. That broke in three independent ways:
#8242 cdn.discordapp.com attachment URLs increasingly require the
bot session to download; unauthenticated httpx sees 403
Forbidden, image/voice analysis fail silently.
#6587 Some user environments (VPNs, corporate DNS, tunnels) resolve
cdn.discordapp.com to private-looking IPs. Our is_safe_url()
guard correctly blocks them as SSRF risks, but the user
environment is legitimate — image analysis and voice STT die.
#11345 The document download path skipped is_safe_url() entirely —
raw aiohttp.ClientSession.get(att.url) with no SSRF check,
inconsistent with the image/audio branches.
Unified fix: use `discord.Attachment.read()` as the primary download
path on all three branches. `att.read()` routes through discord.py's
own authenticated HTTPClient, so:
- Discord CDN auth is handled (#8242 resolved).
- Our is_safe_url() gate isn't consulted for the attachment path at
all — the bot session handles networking internally (#6587 resolved).
- All three branches now share the same code path, eliminating the
document-path SSRF gap (#11345 resolved).
Falls back to the existing cache_*_from_url helpers (image/audio) or an
SSRF-gated aiohttp fetch (documents) when `att.read()` is unavailable
or fails — preserves defense-in-depth for any future payload-schema
drift that could slip a non-CDN URL into att.url.
New helpers on DiscordAdapter:
- _read_attachment_bytes(att) — safe att.read() wrapper
- _cache_discord_image(att, ext) — primary + URL fallback
- _cache_discord_audio(att, ext) — primary + URL fallback
- _cache_discord_document(att, ext) — primary + SSRF-gated aiohttp fallback
Tests:
- tests/gateway/test_discord_attachment_download.py — 12 new cases
covering all three helpers: primary path, fallback on missing
.read(), fallback on validator rejection, SSRF guard on document
fallback, aiohttp fallback happy-path, and an E2E case via
_handle_message confirming cache_image_from_url is never invoked
when att.read() succeeds.
- All 11 existing document-handling tests continue to pass via the
aiohttp fallback path (their SimpleNamespace attachments have no
.read(), which triggers the fallback — now SSRF-gated).
Closes#8242, closes#6587, closes#11345.
When a WebSocket-based platform adapter (e.g. QQ Bot) temporarily
loses its connection, send() now polls is_connected for up to 15s
instead of immediately returning a non-retryable failure. If the
auto-reconnect completes within the window, the message is delivered
normally. On timeout, the SendResult is marked retryable=True so the
base class retry mechanism can attempt re-delivery.
Same treatment applied to _send_media().
Adds 4 async tests covering:
- Successful send after simulated reconnection
- Retryable failure on timeout
- Immediate success when already connected
- _send_media reconnection wait
Fixes#11163
Adds 16 regression tests for the gating logic introduced in the
salvaged commit:
* TestAllowedUsersGate — empty/wildcard/case-insensitive matching,
staff_id vs sender_id, env var CSV population
* TestMentionPatterns — compilation, case-insensitivity, invalid
regex is skipped-not-raised, JSON env var, newline fallback
* TestShouldProcessMessage — DM always accepted, group gating via
require_mention / is_in_at_list / wake-word pattern / free_response_chats
Also adds yule975 to scripts/release.py AUTHOR_MAP (release CI blocks
unmapped emails).
The Copilot API returns HTTP 400 "model_not_supported" when it receives a
model ID it doesn't recognize (vendor-prefixed like
`anthropic/claude-sonnet-4.6` or dash-notation like `claude-sonnet-4-6`).
Two bugs combined to leave both formats unhandled:
1. `_COPILOT_MODEL_ALIASES` in hermes_cli/models.py only covered bare
dot-notation and vendor-prefixed dot-notation. Hermes' default Claude
IDs elsewhere use hyphens (anthropic native format), and users with an
aggregator-style config who switch `model.provider` to `copilot`
inherit `anthropic/claude-X-4.6` — neither case was in the table.
2. The Copilot branch of `normalize_model_for_provider()` only stripped
the vendor prefix when it matched the target provider (`copilot/`) or
was the special-cased `openai/` for openai-codex. Every other vendor
prefix survived to the Copilot request unchanged.
Fix:
- Add dash-notation aliases (`claude-{opus,sonnet,haiku}-4-{5,6}` and the
`anthropic/`-prefixed variants) to the alias table.
- Rewire the Copilot / Copilot-ACP branch of
`normalize_model_for_provider()` to delegate to the existing
`normalize_copilot_model_id()`. That function already does alias
lookups, catalog-aware resolution, and vendor-prefix fallback — it was
being bypassed for the generic normalisation entry point.
Because `switch_model()` already calls `normalize_model_for_provider()`
for every `/model` switch (line 685 in model_switch.py), this single fix
covers the CLI startup path (cli.py), the `/model` slash command path,
and the gateway load-from-config path.
Closes#6879
Credits dsr-restyn (#6743) who independently diagnosed the dash-notation
case; their aliases are folded into this consolidated fix alongside the
vendor-prefix stripping repair.
Follow-up to the reply-reference fix: `_make_discord_adapter` used to return
the raw fetched `Message` as the expected reference, but the adapter now
wraps it via `ref_msg.to_reference(fail_if_not_exists=False)` so Discord
treats a deleted target as 'send without reply chip'. Update the fixture
to return the MessageReference sentinel so the 4 chunk-reference-identity
tests assert against the right object.
No production behavior change; only aligns the stale test fixture.
Follow-up to the reply-reference fix: ensure errors unrelated to the reply
reference (e.g. 50013 Missing Permissions) do NOT trigger the no-reference
retry path and still surface as a failed SendResult. Keeps the wider retry
condition from silently swallowing unrelated API errors.
Proposed in the original issue writeup (#11342) as test case
`test_non_reference_errors_still_propagate`.
* feat(skills): add 'hermes skills reset' to un-stick bundled skills
When a user edits a bundled skill, sync flags it as user_modified and
skips it forever. The problem: if the user later tries to undo the edit
by copying the current bundled version back into ~/.hermes/skills/, the
manifest still holds the old origin hash from the last successful
sync, so the fresh bundled hash still doesn't match and the skill stays
stuck as user_modified.
Adds an escape hatch for this case.
hermes skills reset <name>
Drops the skill's entry from ~/.hermes/skills/.bundled_manifest and
re-baselines against the user's current copy. Future 'hermes update'
runs accept upstream changes again. Non-destructive.
hermes skills reset <name> --restore
Also deletes the user's copy and re-copies the bundled version.
Use when you want the pristine upstream skill back.
Also available as /skills reset in chat.
- tools/skills_sync.py: new reset_bundled_skill(name, restore=False)
- hermes_cli/skills_hub.py: do_reset() + wired into skills_command and
handle_skills_slash; added to the slash /skills help panel
- hermes_cli/main.py: argparse entry for 'hermes skills reset'
- tests/tools/test_skills_sync.py: 5 new tests covering the stuck-flag
repro, --restore, unknown-skill error, upstream-removed-skill, and
no-op on already-clean state
- website/docs/user-guide/features/skills.md: new 'Bundled skill updates'
section explaining the origin-hash mechanic + reset usage
* fix(auth): codex auth remove no longer silently undone by auto-import
'hermes auth remove openai-codex' appeared to succeed but the credential
reappeared on the next command. Two compounding bugs:
1. _seed_from_singletons() for openai-codex unconditionally re-imports
tokens from ~/.codex/auth.json whenever the Hermes auth store is
empty (by design — the Codex CLI and Hermes share that file). There
was no suppression check, unlike the claude_code seed path.
2. auth_remove_command's cleanup branch only matched
removed.source == 'device_code' exactly. Entries added via
'hermes auth add openai-codex' have source 'manual:device_code', so
for those the Hermes auth store's providers['openai-codex'] state was
never cleared on remove — the next load_pool() re-seeded straight
from there.
Net effect: there was no way to make a codex removal stick short of
manually editing both ~/.hermes/auth.json and ~/.codex/auth.json before
opening Hermes again.
Fix:
- Add unsuppress_credential_source() helper (mirrors
suppress_credential_source()).
- Gate the openai-codex branch in _seed_from_singletons() with
is_source_suppressed(), matching the claude_code pattern.
- Broaden auth_remove_command's codex match to handle both
'device_code' and 'manual:device_code' (via endswith check), always
call suppress_credential_source(), and print guidance about the
unchanged ~/.codex/auth.json file.
- Clear the suppression marker in auth_add_command's openai-codex
branch so re-linking via 'hermes auth add openai-codex' works.
~/.codex/auth.json is left untouched — that's the Codex CLI's own
credential store, not ours to delete.
Tests cover: unsuppress helper behavior, remove of both source
variants, add clears suppression, seed respects suppression. E2E
verified: remove → load → add → load flow now behaves correctly.
Add TestWeixinSendImageFileParameterName test class with two tests:
- test_send_image_file_uses_image_path_parameter: verifies the correct
parameter name (image_path) is used when gateway calls send_image_file
- test_send_image_file_works_without_optional_params: ensures minimal
params work correctly
This prevents the interface from drifting again as noted by Copilot.
discord.py does not apply a default AllowedMentions to the client, so any
reply whose content contains @everyone/@here or a role mention would ping
the whole server — including verbatim echoes of user input or LLM output
that happens to contain those tokens.
Set a safe default on commands.Bot: everyone=False, roles=False,
users=True, replied_user=True. Operators can opt back in via four
DISCORD_ALLOW_MENTION_* env vars or discord.allow_mentions.* in
config.yaml. No behavior change for normal user/reply pings.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First pass of test-suite reduction to address flaky CI and bloat.
Removed tests that fall into these change-detector patterns:
1. Source-grep tests (tests/gateway/test_feishu.py, test_email.py): tests
that call inspect.getsource() on production modules and grep for string
literals. Break on any refactor/rename even when behavior is correct.
2. Platform enum tautologies (every gateway/test_X.py): assertions like
`Platform.X.value == 'x'` duplicated across ~9 adapter test files.
3. Toolset/PLATFORM_HINTS/setup-wizard registry-presence checks: tests that
only verify a key exists in a dict. Data-layout tests, not behavior.
4. Argparse wiring tests (test_argparse_flag_propagation, test_subparser_routing
_fallback): tests that do parser.parse_args([...]) then assert args.field.
Tests Python's argparse, not our code.
5. Pure dispatch tests (test_plugins_cmd.TestPluginsCommandDispatch): patch
cmd_X, call plugins_command with matching action, assert mock called.
Tests the if/elif chain, not behavior.
6. Kwarg-to-mock verification (test_auxiliary_client ~45 tests,
test_web_tools_config, test_gemini_cloudcode, test_retaindb_plugin): tests
that mock the external API client, call our function, and assert exact
kwargs. Break on refactor even when behavior is preserved.
7. Schedule-internal "function-was-called" tests (acp/test_server scheduling
tests): tests that patch own helper method, then assert it was called.
Kept behavioral tests throughout: error paths (pytest.raises), security
tests (path traversal, SSRF, redaction), message alternation invariants,
provider API format conversion, streaming logic, memory contract, real
config load/merge tests.
Net reduction: 169 tests removed. 38 empty classes cleaned up.
Collected before: 12,522 tests
Collected after: 12,353 tests
The cache-read, cache-write, and total estimated-cost values shown in
/insights (and the per-model Cost column) were unreliable. Hide them from
both terminal and gateway renderings.
The underlying data pipeline is untouched — sessions still store
cache_read_tokens, cache_write_tokens, and estimated_cost_usd; the web
server, /usage command, and status bar are unaffected. Only the
InsightsEngine display layer is trimmed.
Changes:
- format_terminal: drop 'Cache read / Cache write' line, drop 'Est. cost'
from the Total tokens row, drop per-model 'Cost' column, drop the
'* Cost N/A for custom/self-hosted' footnote.
- format_gateway: drop cache breakdown from Tokens line, drop 'Est. cost'
line, drop per-model cost suffix.
- Tests updated to assert these strings are now absent.
* feat(skills): add 'hermes skills reset' to un-stick bundled skills
When a user edits a bundled skill, sync flags it as user_modified and
skips it forever. The problem: if the user later tries to undo the edit
by copying the current bundled version back into ~/.hermes/skills/, the
manifest still holds the old origin hash from the last successful
sync, so the fresh bundled hash still doesn't match and the skill stays
stuck as user_modified.
Adds an escape hatch for this case.
hermes skills reset <name>
Drops the skill's entry from ~/.hermes/skills/.bundled_manifest and
re-baselines against the user's current copy. Future 'hermes update'
runs accept upstream changes again. Non-destructive.
hermes skills reset <name> --restore
Also deletes the user's copy and re-copies the bundled version.
Use when you want the pristine upstream skill back.
Also available as /skills reset in chat.
- tools/skills_sync.py: new reset_bundled_skill(name, restore=False)
- hermes_cli/skills_hub.py: do_reset() + wired into skills_command and
handle_skills_slash; added to the slash /skills help panel
- hermes_cli/main.py: argparse entry for 'hermes skills reset'
- tests/tools/test_skills_sync.py: 5 new tests covering the stuck-flag
repro, --restore, unknown-skill error, upstream-removed-skill, and
no-op on already-clean state
- website/docs/user-guide/features/skills.md: new 'Bundled skill updates'
section explaining the origin-hash mechanic + reset usage
* fix(nous): respect 'Skip (keep current)' after OAuth login
When a user already set up on another provider (e.g. OpenRouter) runs
`hermes model` and picks Nous Portal, OAuth succeeds and then a model
picker is shown. If the user picks 'Skip (keep current)', the previous
provider + model should be preserved.
Previously, \_update_config_for_provider was called unconditionally after
login, which flipped config.yaml model.provider to 'nous' while keeping
the old model.default (e.g. anthropic/claude-opus-4.6 from OpenRouter),
leaving the user with a mismatched provider/model pair on the next
request.
Fix: snapshot the prior active_provider before login, and if no model is
selected (Skip, or no models available, or fetch failure), restore the
prior active_provider and leave config.yaml untouched. The Nous OAuth
tokens stay saved so future `hermes model` -> Nous works without
re-authenticating.
Test plan:
- New tests cover Skip path (preserves provider+model, saves creds),
pick-a-model path (switches to nous), and fresh-install Skip path
(active_provider cleared, not stuck as 'nous').
The cherry-picked SDK compat fix (previous commit) wired process() to
parse CallbackMessage.data into a ChatbotMessage, but _extract_text()
was still written against the pre-0.20 payload shape:
* message.text changed from dict {content: ...} → TextContent object.
The old code's str(text) fallback produced 'TextContent(content=...)'
as the agent's input, so every received message came in mangled.
* rich_text moved from message.rich_text (list) to
message.rich_text_content.rich_text_list.
This preserves legacy fallbacks (dict-shaped text, bare rich_text list)
while handling the current SDK layout via hasattr(text, 'content').
Adds regression tests covering:
* webhook domain allowlist (api.*, oapi.*, and hostile lookalikes)
* _IncomingHandler.process is a coroutine function
* _extract_text against TextContent object, dict, rich_text_content,
legacy rich_text, and empty-message cases
Also adds kevinskysunny to scripts/release.py AUTHOR_MAP (release CI
blocks unmapped emails).
When a user edits a bundled skill, sync flags it as user_modified and
skips it forever. The problem: if the user later tries to undo the edit
by copying the current bundled version back into ~/.hermes/skills/, the
manifest still holds the old origin hash from the last successful
sync, so the fresh bundled hash still doesn't match and the skill stays
stuck as user_modified.
Adds an escape hatch for this case.
hermes skills reset <name>
Drops the skill's entry from ~/.hermes/skills/.bundled_manifest and
re-baselines against the user's current copy. Future 'hermes update'
runs accept upstream changes again. Non-destructive.
hermes skills reset <name> --restore
Also deletes the user's copy and re-copies the bundled version.
Use when you want the pristine upstream skill back.
Also available as /skills reset in chat.
- tools/skills_sync.py: new reset_bundled_skill(name, restore=False)
- hermes_cli/skills_hub.py: do_reset() + wired into skills_command and
handle_skills_slash; added to the slash /skills help panel
- hermes_cli/main.py: argparse entry for 'hermes skills reset'
- tests/tools/test_skills_sync.py: 5 new tests covering the stuck-flag
repro, --restore, unknown-skill error, upstream-removed-skill, and
no-op on already-clean state
- website/docs/user-guide/features/skills.md: new 'Bundled skill updates'
section explaining the origin-hash mechanic + reset usage
Three tests in tests/test_plugin_skills.py and tests/hermes_cli/test_plugins.py
used caplog.at_level(logging.WARNING) without specifying a logger. When another
test earlier in the same xdist worker touched propagation on tools.skills_tool
or hermes_cli.plugins, caplog would miss the warning and the assertion would
fail intermittently in CI.
These three tests accounted for 15 of the last ~30 Tests workflow failures
(5 each), including the recent main failure on commit 436a7359 (PR #11398).
Fix: pass logger="tools.skills_tool" / logger="hermes_cli.plugins" to
caplog.at_level() so the handler attaches directly to the logger under test
and capture is independent of global propagation state.
Affected tests:
- tests/test_plugin_skills.py::TestSkillViewPluginGuards::test_injection_logged_but_served
- tests/hermes_cli/test_plugins.py::TestPluginCommands::test_register_command_empty_name_rejected
- tests/hermes_cli/test_plugins.py::TestPluginCommands::test_register_command_builtin_conflict_rejected
No production code change. Verified passing under xdist (-n 4) alongside
test_hermes_logging.py (the test most likely to poison the logger state).
Extends test_build_event_handler_registers_reaction_and_card_processors
to assert that register_p2_im_chat_access_event_bot_p2p_chat_entered_v1
and register_p2_im_message_recalled_v1 are called when building the
event handler, matching the production registrations.
Also adds Fatty911 to scripts/release.py AUTHOR_MAP for credit on the
salvaged event-handler fix.
* feat(image_gen): upgrade Recraft V3 → V4 Pro, Nano Banana → Pro
Upstream asked for these two upgrades ASAP — the old entries show
stale models when newer, higher-quality versions are available on FAL.
Recraft V3 → Recraft V4 Pro
ID: fal-ai/recraft-v3 → fal-ai/recraft/v4/pro/text-to-image
Price: $0.04/image → $0.25/image (6x — V4 Pro is premium tier)
Schema: V4 dropped the required `style` enum entirely; defaults
handle taste now. Added `colors` and `background_color`
to supports for brand-palette control. `seed` is not
supported by V4 per the API docs.
Nano Banana → Nano Banana Pro
ID: fal-ai/nano-banana → fal-ai/nano-banana-pro
Price: $0.08/image → $0.15/image (1K); $0.30 at 4K
Schema: Aspect ratio family unchanged. Added `resolution`
(1K/2K/4K, default 1K for billing predictability),
`enable_web_search` (real-time info grounding, +$0.015),
and `limit_generations` (force exactly 1 image).
Architecture: Gemini 2.5 Flash → Gemini 3 Pro Image. Quality
and reasoning depth improved; slower (~6s → ~8s).
Migration: users who had the old IDs in `image_gen.model` will
fall through the existing 'unknown model → default' warning path
in `_resolve_fal_model()` and get the Klein 9B default on the next
run. Re-run `hermes tools` → Image Generation to pick the new
version. No silent cost-upgrade aliasing — the 2-6x price jump
on these tiers warrants explicit user re-selection.
Portal note: both new model IDs need to be allowlisted on the
Nous fal-queue-gateway alongside the previous 7 additions, or
users on Nous Subscription will see the 'managed gateway rejected
model' error we added previously (which is clear and
self-remediating, just noisy).
* docs: wrap '<1s' in backticks to unblock MDX compilation
Docusaurus's MDX parser treats unquoted '<' as the start of JSX, and
'<1s' fails because '1' isn't a valid tag-name start character. This
was broken on main since PR #11265 (never noticed because
docs-site-checks was failing on OTHER issues at the time and we
admin-merged through it).
Wrapping in backticks also gives the cell monospace styling which
reads more cleanly alongside the inline-code model ID in the same row.
The other '<1s' occurrence (line 52) is inside a fenced code block
and is already safe — code fences bypass MDX parsing.
* feat(mcp-oauth): scaffold MCPOAuthManager
Central manager for per-server MCP OAuth state. Provides
get_or_build_provider (cached), remove (evicts cache + deletes
disk), invalidate_if_disk_changed (mtime watch, core fix for
external-refresh workflow), and handle_401 (dedup'd recovery).
No behavior change yet — existing call sites still use
build_oauth_auth directly. Task 1 of 8 in the MCP OAuth
consolidation (fixes Cthulhu's BetterStack reliability issues).
* feat(mcp-oauth): add HermesMCPOAuthProvider with pre-flow disk watch
Subclasses the MCP SDK's OAuthClientProvider to inject a disk
mtime check before every async_auth_flow, via the central
manager. When a subclass instance is used, external token
refreshes (cron, another CLI instance) are picked up before
the next API call.
Still dead code: the manager's _build_provider still delegates
to build_oauth_auth and returns the plain OAuthClientProvider.
Task 4 wires this subclass in. Task 2 of 8.
* refactor(mcp-oauth): extract build_oauth_auth helpers
Decomposes build_oauth_auth into _configure_callback_port,
_build_client_metadata, _maybe_preregister_client, and
_parse_base_url. Public API preserved. These helpers let
MCPOAuthManager._build_provider reuse the same logic in Task 4
instead of duplicating the construction dance.
Also updates the SDK version hint in the warning from 1.10.0 to
1.26.0 (which is what we actually require for the OAuth types
used here). Task 3 of 8.
* feat(mcp-oauth): manager now builds HermesMCPOAuthProvider directly
_build_provider constructs the disk-watching subclass using the
helpers from Task 3, instead of delegating to the plain
build_oauth_auth factory. Any consumer using the manager now gets
pre-flow disk-freshness checks automatically.
build_oauth_auth is preserved as the public API for backwards
compatibility. The code path is now:
MCPOAuthManager.get_or_build_provider ->
_build_provider ->
_configure_callback_port
_build_client_metadata
_maybe_preregister_client
_parse_base_url
HermesMCPOAuthProvider(...)
Task 4 of 8.
* feat(mcp): wire OAuth manager + add _reconnect_event
MCPServerTask gains _reconnect_event alongside _shutdown_event.
When set, _run_http / _run_stdio exit their async-with blocks
cleanly (no exception), and the outer run() loop re-enters the
transport to rebuild the MCP session with fresh credentials.
This is the recovery path for OAuth failures that the SDK's
in-place httpx.Auth cannot handle (e.g. cron externally consumed
the refresh_token, or server-side session invalidation).
_run_http now asks MCPOAuthManager for the OAuth provider
instead of calling build_oauth_auth directly. Config-time,
runtime, and reconnect paths all share one provider instance
with pre-flow disk-watch active.
shutdown() defensively sets both events so there is no race
between reconnect and shutdown signalling.
Task 5 of 8.
* feat(mcp): detect auth failures in tool handlers, trigger reconnect
All 5 MCP tool handlers (tool call, list_resources, read_resource,
list_prompts, get_prompt) now detect auth failures and route
through MCPOAuthManager.handle_401:
1. If the manager says recovery is viable (disk has fresh tokens,
or SDK can refresh in-place), signal MCPServerTask._reconnect_event
to tear down and rebuild the MCP session with fresh credentials,
then retry the tool call once.
2. If no recovery path exists, return a structured needs_reauth
JSON error so the model stops hallucinating manual refresh
attempts (the 'let me curl the token endpoint' loop Cthulhu
pasted from Discord).
_is_auth_error catches OAuthFlowError, OAuthTokenError,
OAuthNonInteractiveError, and httpx.HTTPStatusError(401). Non-auth
exceptions still surface via the generic error path unchanged.
Task 6 of 8.
* feat(mcp-cli): route add/remove through manager, add 'hermes mcp login'
cmd_mcp_add and cmd_mcp_remove now go through MCPOAuthManager
instead of calling build_oauth_auth / remove_oauth_tokens
directly. This means CLI config-time state and runtime MCP
session state are backed by the same provider cache — removing
a server evicts the live provider, adding a server populates
the same cache the MCP session will read from.
New 'hermes mcp login <name>' command:
- Wipes both the on-disk tokens file and the in-memory
MCPOAuthManager cache
- Triggers a fresh OAuth browser flow via the existing probe
path
- Intended target for the needs_reauth error Task 6 returns
to the model
Task 7 of 8.
* test(mcp-oauth): end-to-end integration tests
Five new tests exercising the full consolidation with real file
I/O and real imports (no transport mocks):
1. external_refresh_picked_up_without_restart — Cthulhu's cron
workflow. External process writes fresh tokens to disk;
on the next auth flow the manager's mtime-watch flips
_initialized and the SDK re-reads from storage.
2. handle_401_deduplicates_concurrent_callers — 10 concurrent
handlers for the same failed token fire exactly ONE recovery
attempt (thundering-herd protection).
3. handle_401_returns_false_when_no_provider — defensive path
for unknown servers.
4. invalidate_if_disk_changed_handles_missing_file — pre-auth
state returns False cleanly.
5. provider_is_reused_across_reconnects — cache stickiness so
reconnects preserve the disk-watch baseline mtime.
Task 8 of 8 — consolidation complete.
* docs: fix ascii-guard border alignment errors
Three docs pages had ASCII diagram boxes with off-by-one column
alignment issues that failed docs-site-checks CI:
- architecture.md: outer box is 71 cols but inner-box content lines
and border corners were offset by 1 col, making content-line right
border at col 70/72 while top/bottom border was at col 71. Inner
boxes also had border corners at cols 19/36/53 but content pipes
at cols 20/37/54. Rewrote the diagram with consistent 71-col width
throughout, aligned inner boxes at cols 4-19, 22-37, 40-55 with
2-space gaps and 15-space trailing padding.
- gateway-internals.md: same class of issue — outer box at 51 cols,
inner content lines varied 52-54 cols. Rewrote with consistent
51-col width, inner boxes at cols 4-15, 18-29, 32-43. Also
restructured the bottom-half message flow so it's bare text
(not half-open box cells) matching the intent of the original.
- agent-loop.md line 112-114: box 2 (API thread) content lines had
one extra space pushing the right border to col 46 while the top
and bottom borders of that box sat at col 45. Trimmed one trailing
space from each of the three content lines.
All 123 docs files now pass `npm run lint:diagrams`:
✓ Errors: 0 (warnings: 6, non-fatal)
Pre-existing failures on main — unrelated to any open PR.
* test(setup): accept description kwarg in prompt_choice mock lambdas
setup.py's `_curses_prompt_choice` gained an optional `description`
parameter (used for rendering context hints alongside the prompt).
`prompt_choice` forwards it via keyword arg. The two existing tests
mocked `_curses_prompt_choice` with lambdas that didn't accept the
new kwarg, so the forwarded call raised TypeError.
Fix: add `description=None` to both mock lambda signatures so they
absorb the new kwarg without changing behavior.
* test(matrix): update stale audio-caching assertion
test_regular_audio_has_http_url asserted that non-voice audio
messages keep their HTTP URL and are NOT downloaded/cached. That
was true when the caching code only triggered on
`is_voice_message`. Since bec02f37 (encrypted-media caching
refactor), matrix.py caches all media locally — photos, audio,
video, documents — so downstream tools can read them as real
files via media_urls. This applies to regular audio too.
Renamed the test to `test_regular_audio_is_cached_locally`,
flipped the assertions accordingly, and documented the
intentional behavior change in the docstring. Other tests in
the file (voice-specific caching, message-type detection,
reply-to threading) continue to pass.
* test(413): allow multi-pass preflight compression
run_agent.py's preflight compression runs up to 3 passes in a loop
for very large sessions (each pass summarizes the middle N turns,
then re-checks tokens). The loop breaks when a pass returns a
message list no shorter than its input (can't compress further).
test_preflight_compresses_oversized_history used a static mock
return value that returned the same 2 messages regardless of input,
so the loop ran pass 1 (41 -> 2) and pass 2 (2 -> 2 -> break),
making call_count == 2. The assert_called_once() assertion was
strictly wrong under the multi-pass design.
The invariant the test actually cares about is: preflight ran, and
its first invocation received the full oversized history. Replaced
the count assertion with those two invariants.
* docs: drop '...' from gateway diagram, merge side-by-side boxes
ascii-guard 2.3.0 flagged two remaining issues after the initial fix
pass:
1. gateway-internals.md L33: the '...' suffix after inner box 3's
right border got parsed as 'extra characters after inner-box right
border'. Dropped the '...' — the surrounding prose already conveys
'and more platforms' without needing the visual hint.
2. agent-loop.md: ascii-guard can't cleanly parse two side-by-side
boxes of different heights (main thread 7 rows, API thread 5 rows).
Even equalizing heights didn't help — the linter treats the left
box's right border as the end of the diagram. Merged into a single
54-char-wide outer box with both threads labeled as regions inside,
keeping the ▶ arrow to preserve the main→API flow direction.
Inbound Feishu messages arriving during brief windows when the adapter
loop is unavailable (startup/restart transitions, network-flap reconnect)
were silently dropped with a WARNING log. This matches the symptom in
issue #5499 — and users have reported seeing only a subset of their
messages reach the agent.
Fix: queue pending events in a thread-safe list and spawn a single
drainer thread that replays them once the loop becomes ready. Covers
these scenarios:
* Queue events instead of dropping when loop is None/closed
* Single drainer handles the full queue (not thread-per-event)
* Thread-safe with threading.Lock on the queue and schedule flag
* Handles mid-drain bursts (new events arrive while drainer is working)
* Handles RuntimeError if loop closes between check and submit
* Depth cap (1000) prevents unbounded growth during extended outages
* Drops queue cleanly on disconnect rather than holding forever
* Safety timeout (120s) prevents infinite retention on broken adapters
Based on the approach proposed in #4789 by milkoor, rewritten for
thread-safety and correctness.
Test plan:
* 5 new unit tests (TestPendingInboundQueue) — all passing
* E2E test with real asyncio loop + fake WS thread: 10-event burst
before loop ready → all 10 delivered in order
* E2E concurrent burst test: 20 events queued, 20 more arrive during
drainer dispatch → all 40 delivered, no loss, no duplicates
* All 111 existing feishu tests pass
Related: #5499, #4789
Co-authored-by: milkoor <milkoor@users.noreply.github.com>
* feat(image_gen): multi-model FAL support with picker in hermes tools
Adds 8 FAL text-to-image models selectable via `hermes tools` →
Image Generation → (FAL.ai | Nous Subscription) → model picker.
Models supported:
- fal-ai/flux-2/klein/9b (new default, <1s, $0.006/MP)
- fal-ai/flux-2-pro (previous default, kept backward-compat upscaling)
- fal-ai/z-image/turbo (Tongyi-MAI, bilingual EN/CN)
- fal-ai/nano-banana (Gemini 2.5 Flash Image)
- fal-ai/gpt-image-1.5 (with quality tier: low/medium/high)
- fal-ai/ideogram/v3 (best typography)
- fal-ai/recraft-v3 (vector, brand styles)
- fal-ai/qwen-image (LLM-based)
Architecture:
- FAL_MODELS catalog declares per-model size family, defaults, supports
whitelist, and upscale flag. Three size families handled uniformly:
image_size_preset (flux family), aspect_ratio (nano-banana), and
gpt_literal (gpt-image-1.5).
- _build_fal_payload() translates unified inputs (prompt + aspect_ratio)
into model-specific payloads, merges defaults, applies caller overrides,
wires GPT quality_setting, then filters to the supports whitelist — so
models never receive rejected keys.
- IMAGEGEN_BACKENDS registry in tools_config prepares for future imagegen
providers (Replicate, Stability, etc.); each provider entry tags itself
with imagegen_backend: 'fal' to select the right catalog.
- Upscaler (Clarity) defaults off for new models (preserves <1s value
prop), on for flux-2-pro (backward-compat). Per-model via FAL_MODELS.
Config:
image_gen.model = fal-ai/flux-2/klein/9b (new)
image_gen.quality_setting = medium (new, GPT only)
image_gen.use_gateway = bool (existing)
Agent-facing schema unchanged (prompt + aspect_ratio only) — model
choice is a user-level config decision, not an agent-level arg.
Picker uses curses_radiolist (arrow keys, auto numbered-fallback on
non-TTY). Column-aligned: Model / Speed / Strengths / Price.
Docs: image-generation.md rewritten with the model table and picker
walkthrough. tools-reference, tool-gateway, overview updated to drop
the stale "FLUX 2 Pro" wording.
Tests: 42 new in tests/tools/test_image_generation.py covering catalog
integrity, all 3 size families, supports filter, default merging, GPT
quality wiring, model resolution fallback. 8 new in
tests/hermes_cli/test_tools_config.py for picker wiring (registry,
config writes, GPT quality follow-up prompt, corrupt-config repair).
* feat(image_gen): translate managed-gateway 4xx to actionable error
When the Nous Subscription managed FAL proxy rejects a model with 4xx
(likely portal-side allowlist miss or billing gate), surface a clear
message explaining:
1. The rejected model ID + HTTP status
2. Two remediation paths: set FAL_KEY for direct access, or
pick a different model via `hermes tools`
5xx, connection errors, and direct-FAL errors pass through unchanged
(those have different root causes and reasonable native messages).
Motivation: new FAL models added to this release (flux-2-klein-9b,
z-image-turbo, nano-banana, gpt-image-1.5, ideogram-v3, recraft-v3,
qwen-image) are untested against the Nous Portal proxy. If the portal
allowlists model IDs, users on Nous Subscription will hit cryptic
4xx errors without guidance on how to work around it.
Tests: 8 new cases covering status extraction across httpx/fal error
shapes and 4xx-vs-5xx-vs-ConnectionError translation policy.
Docs: brief note in image-generation.md for Nous subscribers.
Operator action (Nous Portal side): verify that fal-queue-gateway
passes through these 7 new FAL model IDs. If the proxy has an
allowlist, add them; otherwise Nous Subscription users will see the
new translated error and fall back to direct FAL.
* feat(image_gen): pin GPT-Image quality to medium (no user choice)
Previously the tools picker asked a follow-up question for GPT-Image
quality tier (low / medium / high) and persisted the answer to
`image_gen.quality_setting`. This created two problems:
1. Nous Portal billing complexity — the 22x cost spread between tiers
($0.009 low / $0.20 high) forces the gateway to meter per-tier per
user, which the portal team can't easily support at launch.
2. User footgun — anyone picking `high` by mistake burns through
credit ~6x faster than `medium`.
This commit pins quality at medium by baking it into FAL_MODELS
defaults for gpt-image-1.5 and removes all user-facing override paths:
- Removed `_resolve_gpt_quality()` runtime lookup
- Removed `honors_quality_setting` flag on the model entry
- Removed `_configure_gpt_quality_setting()` picker helper
- Removed `_GPT_QUALITY_CHOICES` constant
- Removed the follow-up prompt call in `_configure_imagegen_model()`
- Even if a user manually edits `image_gen.quality_setting` in
config.yaml, no code path reads it — always sends medium.
Tests:
- Replaced TestGptQualitySetting (6 tests) with TestGptQualityPinnedToMedium
(5 tests) — proves medium is baked in, config is ignored, flag is
removed, helper is removed, non-gpt models never get quality.
- Replaced test_picker_with_gpt_image_also_prompts_quality with
test_picker_with_gpt_image_does_not_prompt_quality — proves only 1
picker call fires when gpt-image is selected (no quality follow-up).
Docs updated: image-generation.md replaces the quality-tier table
with a short note explaining the pinning decision.
* docs(image_gen): drop stale 'wires GPT quality tier' line from internals section
Caught in a cleanup sweep after pinning quality to medium. The
"How It Works Internally" walkthrough still described the removed
quality-wiring step.
PR #4918 fixed the double-/v1 bug at fresh agent init by stripping the
trailing /v1 from OpenCode base URLs when api_mode is anthropic_messages
(so the Anthropic SDK's own /v1/messages doesn't land on /v1/v1/messages).
The same logic was missing from the /model mid-session switch path.
Repro: start a session on opencode-go with GLM-5 (or any chat_completions
model), then `/model minimax-m2.7`. switch_model() correctly sets
api_mode=anthropic_messages via opencode_model_api_mode(), but base_url
passes through as https://opencode.ai/zen/go/v1. The Anthropic SDK then
POSTs to https://opencode.ai/zen/go/v1/v1/messages, which returns the
OpenCode website 404 HTML page (title 'Not Found | opencode').
Same bug affects `/model claude-sonnet-4-6` on opencode-zen.
Verified upstream: POST /v1/messages returns clean JSON 401 with x-api-key
auth (route works), while POST /v1/v1/messages returns the exact HTML 404
users reported.
Fix mirrors runtime_provider.resolve_runtime_provider:
- hermes_cli/model_switch.py::switch_model() strips /v1 after the OpenCode
api_mode override when the resolved mode is anthropic_messages.
- run_agent.py::AIAgent.switch_model() applies the same strip as
defense-in-depth, so any direct caller can't reintroduce the double-/v1.
Tests: 9 new regression tests in tests/hermes_cli/test_model_switch_opencode_anthropic.py
covering minimax on opencode-go, claude on opencode-zen, chat_completions
(GLM/Kimi/Gemini) keeping /v1 intact, codex_responses (GPT) keeping /v1
intact, trailing-slash handling, and the agent-level defense-in-depth.
Follow-ups on top of kshitijk4poor's cherry-picked salvage of PR #8018:
tools/environments/daytona.py
- PID-suffix /tmp/.hermes_sync.<pid>.tar so concurrent sync_back calls
against the same sandbox don't collide on the remote temp path
- Move sync_back() inside the cleanup lock and after the _sandbox-None
guard, with its own try/except. Previously a no-op cleanup (sandbox
already cleared) still fired sync_back → 3-attempt retry storm against
a nil sandbox (~6s of sleep). Now short-circuits cleanly.
tools/environments/file_sync.py
- Add _SYNC_BACK_MAX_BYTES (2 GiB) defensive cap: refuse to extract a
tar larger than the limit. Protects against runaway sandboxes
producing arbitrary-size archives.
- Add 'nothing previously pushed' guard at the top of sync_back(). If
_pushed_hashes and _synced_files are both empty, the FileSyncManager
was never initialized from the host side — there is nothing coherent
to sync back. Skips the retry/backoff machinery on uninitialized
managers and eliminates test-suite slowdown from pre-existing cleanup
tests that don't mock the sync layer.
tests/tools/test_file_sync_back.py
- Update _make_manager helper to seed a _pushed_hashes entry by default
so sync_back() exercises its real path. A seed_pushed_state=False
opt-out is available for noop-path tests.
- Add TestSyncBackSizeCap with positive and negative coverage of the
new cap.
tests/tools/test_sync_back_backends.py
- Update Daytona bulk download test to assert the PID-suffixed path
pattern instead of the fixed /tmp/.hermes_sync.tar.
Salvage of PR #8018 by @alt-glitch onto current main.
On sandbox teardown, FileSyncManager now downloads the remote .hermes/
directory, diffs against SHA-256 hashes of what was originally pushed,
and applies only changed files back to the host.
Core (tools/environments/file_sync.py):
- sync_back(): orchestrates download -> unpack -> diff -> apply with:
- Retry with exponential backoff (3 attempts, 2s/4s/8s)
- SIGINT trap + defer (prevents partial writes on Ctrl-C)
- fcntl.flock serialization (concurrent gateway sandboxes)
- Last-write-wins conflict resolution with warning
- New remote files pulled back via _infer_host_path prefix matching
Backends:
- SSH: _ssh_bulk_download — tar cf - piped over SSH
- Modal: _modal_bulk_download — exec tar cf - -> proc.stdout.read
- Daytona: _daytona_bulk_download — exec tar cf -> SDK download_file
- All three call sync_back() at the top of cleanup()
Fixes applied during salvage (vs original PR #8018):
| # | Issue | Fix |
|---|-------|-----|
| C1 | import fcntl unconditional — crashes Windows | try/except with fallback; _sync_back_locked skips locking when fcntl=None |
| W1 | assert for runtime guard (stripped by -O) | Replaced with proper if/raise RuntimeError |
| W2 | O(n*m) from _get_files_fn() called per file | Cache mapping once at start of _sync_back_impl, pass to resolve/infer |
| W3 | Dead BulkDownloadFn imports in 3 backends | Removed unused imports |
| W4 | Modal hardcodes root/.hermes, no explanation | Added docstring comment explaining Modal always runs as root |
| S1 | SHA-256 computed for new files where pushed_hash=None | Skip hashing when pushed_hash is None (comparison always False) |
| S2 | Daytona /tmp/.hermes_sync.tar never cleaned up | Added rm -f after download (best-effort) |
Tests: 49 passing (17 new: _infer_host_path edge cases, SIGINT
main/worker thread, Windows fcntl=None fallback, Daytona tar cleanup).
Based on #8018 by @alt-glitch.
All 61 TUI-related tests green across 3 consecutive xdist runs.
tests/tui_gateway/test_protocol.py:
- rename `get_messages` → `get_messages_as_conversation` on mock DB (method
was renamed in the real backend, test was still stubbing the old name)
- update tool-message shape expectation: `{role, name, context}` matches
current `_history_to_messages` output, not the legacy `{role, text}`
tests/hermes_cli/test_tui_resume_flow.py:
- `cmd_chat` grew a first-run provider-gate that bailed to "Run: hermes
setup" before `_launch_tui` was ever reached; 3 tests stubbed
`_resolve_last_session` + `_launch_tui` but not the gate
- factored a `main_mod` fixture that stubs `_has_any_provider_configured`,
reused by all three tests
tests/test_tui_gateway_server.py:
- `test_config_set_personality_resets_history_and_returns_info` was flaky
under xdist because the real `_write_config_key` touches
`~/.hermes/config.yaml`, racing with any other worker that writes
config. Stub it in the test.
The Discord voice receive path skipped RFC 3550 §5.1 padding handling,
passing padding-contaminated payloads into DAVE E2EE decrypt and Opus
decode. Symptoms in live VC sessions: deaf inbound speech, intermittent
empty STT results, "corrupted stream" decode errors — especially on the
first reply after join.
When the P bit is set in the RTP header, the last payload byte holds the
count of trailing padding bytes (including itself) that must be removed.
Receive pipeline now follows the spec order:
1. RTP header parse
2. NaCl transport decrypt (aead_xchacha20_poly1305_rtpsize)
3. strip encrypted RTP extension data from start
4. strip RTP padding from end if P bit set ← was missing
5. DAVE inner media decrypt
6. Opus decode
Drops malformed packets where pad_len is 0 or exceeds payload length.
Adds 7 integration tests covering valid padded packets, the X+P combined
case, padding under DAVE passthrough, and three malformed-padding paths.
Closes#11267
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When the dashboard connects to a remote gateway via GATEWAY_HEALTH_URL,
display the URL instead of the remote PID (which is meaningless locally).
Falls back to PID display for local gateways as before.
- Backend: expose gateway_health_url in /api/status response
- Frontend: prefer gateway_health_url over PID in gatewayValue()
- Add truncate + title tooltip for long URLs that overflow the card
- Add min-w-0/overflow-hidden on status cards for proper truncation
- Tests: verify gateway_health_url in remote and no-URL scenarios
Two new tests in tests/run_agent/ that pin the user-visible invariant
behind AlexKucera's Discord report (2026-04-16): no matter how a future
keepalive / transport fix for #10324 plumbs sockets in, sequential
chats on the same AIAgent instance must all succeed.
test_create_openai_client_reuse.py (no network, runs in CI):
- test_second_create_does_not_wrap_closed_transport_from_first
back-to-back _create_openai_client calls must not hand the same
http_client (after an SDK close) to the second construction
- test_replace_primary_openai_client_survives_repeated_rebuilds
three sequential rebuilds via the real _replace_primary_openai_client
entrypoint must each install a live client
test_sequential_chats_live.py (opt-in, HERMES_LIVE_TESTS=1):
- test_three_sequential_chats_across_client_rebuild
real OpenRouter round trips, with an explicit
_replace_primary_openai_client call between turns 2 and 3.
Error-sentinel detector treats 'API call failed after 3 retries'
replies as failures instead of letting them pass the naive
truthy check (which is how a first draft of this test missed
the bug it was meant to catch).
Validation:
clean main (post-revert, defensive copy present)
-> all 4 tests PASS
broken #10933 state (keepalive injection, no defensive copy)
-> all 4 tests FAIL with precise messages pointing at #10933
Companion to taeuk178's test_create_openai_client_kwargs_isolation.py,
which pins the syntactic 'don't mutate input dict' half of the same
contract. Together they catch both the specific mechanism of #10933
and any other reimplementation that breaks the sequential-call
invariant.
* fix(cli): stop approval panel from clipping approve/deny off-screen
The dangerous-command approval panel had an unbounded Window height with
choices at the bottom. When tirith findings produced long descriptions or
the terminal was compact, HSplit clipped the bottom of the widget — which
is exactly where approve/session/always/deny live. Users were asked to
decide on commands without being able to see the choices (and sometimes
the command itself was hidden too).
Fix: reorder the panel so title → command → choices render first, with
description last. Budget vertical rows so the mandatory content (command
and every choice) always fits, and truncate the description to whatever
row budget is left. Handle three edge cases:
- Long description in a normal terminal: description gets truncated at
the bottom with a '… (description truncated)' marker. Command and
all four choices always visible.
- Compact terminal (≤ ~14 rows): description dropped entirely. Command
and choices are the only content, no overflow.
- /view on a giant command: command gets truncated with a marker so
choices still render. Keeps at least 2 rows of command.
Same row-budgeting pattern applied to the clarify widget, which had the
identical structural bug (long question would push choices off-screen).
Adds regression tests covering all three scenarios.
* fix(cli): add compact chrome mode for approval/clarify panels on short terminals
Live PTY test at 100x14 rows revealed reserved_below=4 was too optimistic
— the spinner/tool-progress line, status bar, input area, separators, and
prompt symbol actually consume ~6 rows below the panel. At 14 rows, the
panel still got 'Deny' clipped off the bottom.
Fix: bump reserved_below to 6 (measured from live PTY output) and add a
compact-chrome mode that drops the blank separators between title/command
and command/choices when the full-chrome panel wouldn't fit. Chrome goes
from 5 rows to 3 rows in tight mode, keeping command + all 4 choices on
screen in terminals as small as ~13 rows.
Same compact-chrome pattern applied to the clarify widget.
Verified live in PTY hermes chat sessions at 100x14 (compact chrome
triggered, all choices visible) and 100x30 (full chrome with blanks, nice
spacing) by asking the agent to run 'rm -rf /tmp/sandbox'.
---------
Co-authored-by: Teknium <teknium@nousresearch.com>
Users with 'commit.gpgsign = true' in their global git config got a
pinentry popup (or a failed commit) every time the agent took a
background filesystem snapshot — every write_file, patch, or diff
mid-session. With GPG_TTY unset, pinentry-qt/gtk would spawn a GUI
window, constantly interrupting the session.
The shadow repo is internal Hermes infrastructure. It must not
inherit user-level git settings (signing, hooks, aliases, credential
helpers, etc.) under any circumstance.
Fix is layered:
1. _git_env() sets GIT_CONFIG_GLOBAL=os.devnull,
GIT_CONFIG_SYSTEM=os.devnull, and GIT_CONFIG_NOSYSTEM=1. Shadow
git commands no longer see ~/.gitconfig or /etc/gitconfig at all
(uses os.devnull for Windows compat).
2. _init_shadow_repo() explicitly writes commit.gpgsign=false and
tag.gpgSign=false into the shadow's own config, so the repo is
correct even if inspected or run against directly without the
env vars, and for older git versions (<2.32) that predate
GIT_CONFIG_GLOBAL.
3. _take() passes --no-gpg-sign inline on the commit call. This
covers existing shadow repos created before this fix — they will
never re-run _init_shadow_repo (it is gated on HEAD not existing),
so they would miss layer 2. Layer 1 still protects them, but the
inline flag guarantees correctness at the commit call itself.
Existing checkpoints, rollback, list, diff, and restore all continue
to work — history is untouched. Users who had the bug stop getting
pinentry popups; users who didn't see no observable change.
Tests: 5 new regression tests in TestGpgAndGlobalConfigIsolation,
including a full E2E repro with fake HOME, global gpgsign=true, and
a deliberately broken GPG binary — checkpoint succeeds regardless.
* - make buffered streaming
- fix path naming to expand `~` for agent.
- fix stripping of matrix ID to not remove other mentions / localports.
* fix(matrix): register MembershipEventDispatcher for invite auto-join
The mautrix migration (#7518) broke auto-join because InternalEventType.INVITE
events are only dispatched when MembershipEventDispatcher is registered on the
client. Without it, _on_invite is dead code and the bot silently ignores all
room invites.
Closes#10094Closes#10725
Refs: PR #10135 (digging-airfare-4u), PR #10732 (fxfitz)
* fix(matrix): preserve _joined_rooms reference for CryptoStateStore
connect() reassigned self._joined_rooms = set(...) after initial sync,
orphaning the reference captured by _CryptoStateStore at init time.
find_shared_rooms() returned [] forever, breaking Megolm session rotation
on membership changes.
Mutate in place with clear() + update() so the CryptoStateStore reference
stays valid.
Refs #8174, PR #8215
* fix(matrix): remove dual ROOM_ENCRYPTED handler to fix dedup race
mautrix auto-registers DecryptionDispatcher when client.crypto is set.
The adapter also registered _on_encrypted_event for the same event type.
_on_encrypted_event had zero awaits and won the race to mark event IDs
in the dedup set, causing _on_room_message to drop successfully decrypted
events from DecryptionDispatcher. The retry loop masked this by re-decrypting
every message ~4 seconds later.
Remove _on_encrypted_event entirely. DecryptionDispatcher handles decryption;
genuinely undecryptable events are logged by mautrix and retried on next
key exchange.
Refs #8174, PR #8215
* fix(matrix): re-verify device keys after share_keys() upload
Matrix homeservers treat ed25519 identity keys as immutable per device.
share_keys() can return 200 but silently ignore new keys if the device
already exists with different identity keys. The bot would proceed with
shared=True while peers encrypt to the old (unreachable) keys.
Now re-queries the server after share_keys() and fails closed if keys
don't match, with an actionable error message.
Refs #8174, PR #8215
* fix(matrix): encrypt outbound attachments in E2EE rooms
_upload_and_send() uploaded raw bytes and used the 'url' key for all
rooms. In E2EE rooms, media must be encrypted client-side with
encrypt_attachment(), the ciphertext uploaded, and the 'file' key
(with key/iv/hashes) used instead of 'url'.
Now detects encrypted rooms via state_store.is_encrypted() and
branches to the encrypted upload path.
Refs: PR #9822 (charles-brooks)
* fix(matrix): add stop_typing to clear typing indicator after response
The adapter set a 30-second typing timeout but never cleared it.
The base class stop_typing() is a no-op, so the typing indicator
lingered for up to 30 seconds after each response.
Closes#6016
Refs: PR #6020 (r266-tech)
* fix(matrix): cache all media types locally, not just photos/voice
should_cache_locally only covered PHOTO, VOICE, and encrypted media.
Unencrypted audio/video/documents in plaintext rooms were passed as MXC
URLs that require authentication the agent doesn't have, resulting
in 401 errors.
Refs #3487, #3806
* fix(matrix): detect stale OTK conflict on startup and fail closed
When crypto state is wiped but the same device ID is reused, the
homeserver may still hold one-time keys signed with the previous
identity key. Identity key re-upload succeeds but OTK uploads fail
with "already exists" and a signature mismatch. Peers cannot
establish new Olm sessions, so all new messages are undecryptable.
Now proactively flushes OTKs via share_keys() during connect() and
catches the "already exists" error with an actionable log message
telling the operator to purge the device from the homeserver or
generate a fresh device ID.
Also documents the crypto store recovery procedure in the Matrix
setup guide.
Refs #8174
* docs(matrix): improve crypto recovery docs per review
- Put easy path (fresh access token) first, manual purge second
- URL-encode user ID in Synapse admin API example
- Note that device deletion may invalidate the access token
- Add "stop Synapse first" caveat for direct SQLite approach
- Mention the fail-closed startup detection behavior
- Add back-reference from upgrade section to OTK warning
* refactor(matrix): cleanup from code review
- Extract _extract_server_ed25519() and _reverify_keys_after_upload()
to deduplicate the re-verification block (was copy-pasted in two
places, three copies of ed25519 key extraction total)
- Remove dead code: _pending_megolm, _retry_pending_decryptions,
_MAX_PENDING_EVENTS, _PENDING_EVENT_TTL — all orphaned after
removing _on_encrypted_event
- Remove tautological TestMediaCacheGate (tested its own predicate,
not production code)
- Remove dead TestMatrixMegolmEventHandling and
TestMatrixRetryPendingDecryptions (tested removed methods)
- Merge duplicate TestMatrixStopTyping into TestMatrixTypingIndicator
- Trim comment to just the "why"
The blocking gateway approval wait at tools/approval.py called
`entry.event.wait(timeout=...)` which never touched the agent's
activity tracker. When a user was slow to respond to a /approve prompt
(or the gateway_timeout config was set higher than the default 300s),
the agent thread sat silent long enough for the gateway's inactivity
watchdog (agent.gateway_timeout, default 1800s) to kill it — even
though the agent was doing exactly the right thing and the user was
the one causing the delay.
The fix polls the event in 1s slices and calls touch_activity_if_due
between slices, mirroring the _wait_for_process() pattern in
tools/environments/base.py that covers the subprocess-waiting side of
the same problem. At the default 10s heartbeat cadence, a 300s
approval wait now pings activity ~30 times, well under the 1800s
idle threshold.
Observed in community user logs: 12 repeated 'Agent idle 1800s,
last_activity=executing tool: terminal' events across April 12-14.
Companion to PR #10501 which covered streaming / concurrent-tool /
Modal-backend gaps but did not touch approval.py.
Test: tests/tools/test_approval_heartbeat.py — verifies (1) heartbeats
fire during the wait, (2) user responses are still near-instant, and
(3) the approval path stays functional when the heartbeat helper
can't be imported.
Adds Google Gemini TTS as the seventh voice provider, with 30 prebuilt
voices (Zephyr, Puck, Kore, Enceladus, Gacrux, etc.) and natural-language
prompt control. Integrates through the existing provider chain:
- tools/tts_tool.py: new _generate_gemini_tts() calls the
generativelanguage REST endpoint with responseModalities=[AUDIO],
wraps the returned 24kHz mono 16-bit PCM (L16) in a WAV RIFF header,
then ffmpeg-converts to MP3 or Opus depending on output extension.
For .ogg output, libopus is forced explicitly so Telegram voice
bubbles get Opus (ffmpeg defaults to Vorbis for .ogg).
- hermes_cli/tools_config.py: exposes 'Google Gemini TTS' as a provider
option in the curses-based 'hermes tools' UI.
- hermes_cli/setup.py: adds gemini to the setup wizard picker, tool
status display, and API key prompt branch (accepts existing
GEMINI_API_KEY or GOOGLE_API_KEY, falls back to Edge if neither set).
- tests/tools/test_tts_gemini.py: 15 unit tests covering WAV header
wrap correctness, env var fallback (GEMINI/GOOGLE), voice/model
overrides, snake_case vs camelCase inlineData handling, HTTP error
surfacing, and empty-audio edge cases.
- docs: TTS features page updated to list seven providers with the new
gemini config block and ffmpeg notes.
Live-tested against api key against gemini-2.5-flash-preview-tts: .wav,
.mp3, and Telegram-compatible .ogg (Opus codec) all produce valid
playable audio.
Replace the HERMES_ENABLE_NOUS_MANAGED_TOOLS env-var feature flag with
subscription-based detection. The Tool Gateway is now available to any
paid Nous subscriber without needing a hidden env var.
Core changes:
- managed_nous_tools_enabled() checks get_nous_auth_status() +
check_nous_free_tier() instead of an env var
- New use_gateway config flag per tool section (web, tts, browser,
image_gen) records explicit user opt-in and overrides direct API
keys at runtime
- New prefers_gateway(section) shared helper in tool_backend_helpers.py
used by all 4 tool runtimes (web, tts, image gen, browser)
UX flow:
- hermes model: after Nous login/model selection, shows a curses
prompt listing all gateway-eligible tools with current status.
User chooses to enable all, enable only unconfigured tools, or skip.
Defaults to Enable for new users, Skip when direct keys exist.
- hermes tools: provider selection now manages use_gateway flag —
selecting Nous Subscription sets it, selecting any other provider
clears it
- hermes status: renamed section to Nous Tool Gateway, added
free-tier upgrade nudge for logged-in free users
- curses_radiolist: new description parameter for multi-line context
that survives the screen clear
Runtime behavior:
- Each tool runtime (web_tools, tts_tool, image_generation_tool,
browser_use) checks prefers_gateway() before falling back to
direct env-var credentials
- get_nous_subscription_features() respects use_gateway flags,
suppressing direct credential detection when the user opted in
Removed:
- HERMES_ENABLE_NOUS_MANAGED_TOOLS env var and all references
- apply_nous_provider_defaults() silent TTS auto-set
- get_nous_subscription_explainer_lines() static text
- Override env var warnings (use_gateway handles this properly now)
Regression from #11161 (Claude Opus 4.7 migration, commit 0517ac3e).
The Opus 4.7 migration changed `ADAPTIVE_EFFORT_MAP["xhigh"]` from "max"
(the pre-migration alias) to "xhigh" to preserve the new 4.7 effort level
as distinct from max. This is correct for 4.7, but Opus/Sonnet 4.6 only
expose 4 levels (low/medium/high/max) — sending "xhigh" there now 400s:
BadRequestError [HTTP 400]: This model does not support effort
level 'xhigh'. Supported levels: high, low, max, medium.
Users who set reasoning_effort=xhigh as their default (xhigh is the
recommended default for coding/agentic on 4.7 per the Anthropic migration
guide) now 400 every request the moment they switch back to a 4.6 model
via `/model` or config. Verified live against the Anthropic API on
`anthropic==0.94.0`.
Fix: make the mapping model-aware. Add `_supports_xhigh_effort()`
predicate (matches 4-7/4.7 substrings, mirroring the existing
`_supports_adaptive_thinking` / `_forbids_sampling_params` pattern).
On pre-4.7 adaptive models, downgrade xhigh→max (the strongest effort
those models accept, restoring pre-migration behavior). On 4.7+, keep
xhigh as a distinct level.
Per Anthropic's migration guide, xhigh is 4.7-only:
https://platform.claude.com/docs/en/about-claude/models/migration-guide
> Opus 4.7 effort levels: max, xhigh (new), high, medium, low.
> Opus 4.6 effort levels: max, high, medium, low.
SDK typing confirms: `anthropic.types.OutputConfigParam.effort: Literal[
"low", "medium", "high", "max"]` (v0.94.0 not yet updated for xhigh).
## Test plan
Verified live on macOS 15.5 / anthropic==0.94.0:
claude-opus-4-6 + effort=xhigh → output_config.effort=max → 200 OK
claude-opus-4-7 + effort=xhigh → output_config.effort=xhigh → 200 OK
claude-opus-4-6 + effort=max → output_config.effort=max → 200 OK
claude-opus-4-7 + effort=max → output_config.effort=max → 200 OK
`tests/agent/test_anthropic_adapter.py` — 120 pass (replaced 1 bugged
test that asserted the broken behavior, added 1 for 4.7 preservation).
Full adapter suite: 120 passed in 1.05s.
Broader suite (agent + run_agent + cli/gateway reasoning): 2140 passed
(2 pre-existing failures on clean upstream/main, unrelated).
## Platforms
Tested on macOS 15.5. No platform-specific code paths touched.
Claude Opus 4.7 introduced several breaking API changes that the current
codebase partially handled but not completely. This patch finishes the
migration per the official migration guide at
https://platform.claude.com/docs/en/about-claude/models/migration-guideFixesNousResearch/hermes-agent#11137
Breaking-change coverage:
1. Adaptive thinking + output_config.effort — 4.7 is now recognized by
_supports_adaptive_thinking() (extends previous 4.6-only gate).
2. Sampling parameter stripping — 4.7 returns 400 for any non-default
temperature / top_p / top_k. build_anthropic_kwargs drops them as a
safety net; the OpenAI-protocol auxiliary path (_build_call_kwargs)
and AnthropicCompletionsAdapter.create() both early-exit before
setting temperature for 4.7+ models. This keeps flush_memories and
structured-JSON aux paths that hardcode temperature from 400ing
when the aux model is flipped to 4.7.
3. thinking.display = "summarized" — 4.7 defaults display to "omitted",
which silently hides reasoning text from Hermes's CLI activity feed
during long tool runs. Restoring "summarized" preserves 4.6 UX.
4. Effort level mapping — xhigh now maps to xhigh (was xhigh→max, which
silently over-efforted every coding/agentic request). max is now a
distinct ceiling per Anthropic's 5-level effort model.
5. New stop_reason values — refusal and model_context_window_exceeded
were silently collapsed to "stop" (end_turn) by the adapter's
stop_reason_map. Now mapped to "content_filter" and "length"
respectively, matching upstream finish-reason handling already in
bedrock_adapter.
6. Model catalogs — claude-opus-4-7 added to the Anthropic provider
list, anthropic/claude-opus-4.7 added at top of OpenRouter fallback
catalog (recommended), claude-opus-4-7 added to model_metadata
DEFAULT_CONTEXT_LENGTHS (1M, matching 4.6 per migration guide).
7. Prefill docstrings — run_agent.AIAgent and BatchRunner now document
that Anthropic Sonnet/Opus 4.6+ reject a trailing assistant-role
prefill (400).
8. Tests — 4 new tests in test_anthropic_adapter covering display
default, xhigh preservation, max on 4.7, refusal / context-overflow
stop_reason mapping, plus the sampling-param predicate. test_model_metadata
accepts 4.7 at 1M context.
Tested on macOS 15.5 (darwin). 119 tests pass in
tests/agent/test_anthropic_adapter.py, 1320 pass in tests/agent/.
Models may send whitespace-only strings like {"conclusion": " "} which
pass bool() but create meaningless conclusions. Strip both inputs so
whitespace-only values are treated as empty.
Adds tests for whitespace-only conclusion and delete_id.
Reviewed-by: @erosika
Improve honcho_conclude tool descriptions to explicitly tell the model
not to send both params together. Add runtime validation that rejects
calls with both or neither of conclusion/delete_id. Add schema
regression test and both-params rejection test.
Consolidates #10847 by @ygd58, #10864 by @cola-runner,
#10870 by @vominh1919, and #10952 by @ogzerber.
The anyOf removal itself was already merged; this adds the
runtime validation and tests those PRs contributed.
Co-authored-by: ygd58 <ygd58@users.noreply.github.com>
Co-authored-by: cola-runner <cola-runner@users.noreply.github.com>
Co-authored-by: vominh1919 <vominh1919@users.noreply.github.com>
Initialize next_channel_prompt before the pending_event check and use
getattr with None default, matching the existing pattern for
next_source/next_message/next_message_id. Prevents AttributeError
when pending_event is None (interrupt path).
Cherry-picked from #10953 by @jackjin1997.
Shallow-copy client_kwargs at the top of _create_openai_client() to
prevent in-place mutation from leaking back into self._client_kwargs.
Defensive fix that locks the contract for future httpx/transport work.
Cherry-picked from #10978 by @taeuk178.
resolve_vision_provider_client() was receiving the raw call_llm
parameters instead of the resolved provider/model/key/url from
_resolve_task_provider_model(). This caused config overrides
(auxiliary.vision.provider, etc.) to be silently discarded.
Cherry-picked from #10901 by @lrawnsley.
provider_model_ids() and list_authenticated_providers() had no case for
"ollama-cloud", so the /model slash command showed 0 models despite
fetch_ollama_cloud_models() being fully implemented. The CLI subcommand
worked because it called fetch_ollama_cloud_models() directly.
- Add ollama-cloud case to provider_model_ids() in models.py
- Populate curated dict for ollama-cloud in list_authenticated_providers()
- Add tests for both code paths
When a cron job's agent run completes but produces an empty final_response
(e.g. API 404 from invalid model name), the scheduler now marks last_status
as "error" instead of "ok", so the failure is visible in job listings.
Previously, any run that didn't raise an exception was marked "ok" regardless
of whether the agent actually produced output.
Group A (3 tests): 'No LLM provider configured' RuntimeError
- test_user_message_surrogates_sanitized, test_counters_initialized_in_init,
test_openai_prompt_tokens_unchanged
- Root cause: AIAgent.__init__ now requires base_url alongside api_key to
skip resolve_provider_client() (which returns None when API keys are
blanked in CI). Added base_url='http://localhost:1234/v1' to test
agent construction.
Group B (5 tests): Discord slash command auto-registration
- test_auto_registers_missing_gateway_commands, test_auto_registered_command_*,
test_register_skill_group_*
- Root cause: xdist workers that loaded a discord mock WITHOUT
app_commands.Command/Group caused _register_slash_commands() to fail
silently. Added comprehensive shared discord mock in
tests/gateway/conftest.py (same pattern as existing telegram mock).
Group C (5 errors): Discord reply mode 'NoneType has no DMChannel'
- All TestReplyToText tests
- Root cause: FakeDMChannel was not a subclass of real discord.DMChannel,
so isinstance() checks in _handle_message failed when running in full
suite (real discord installed). Made FakeDMChannel inherit from
discord.DMChannel when available. Removed fragile monkeypatch approach.
Group D (2 tests): detect_provider_for_model wrong provider
- test_openrouter_slug_match (got 'ai-gateway'), test_bare_name_gets_
openrouter_slug (got 'copilot')
- Root cause: ai-gateway, copilot, and kilocode are multi-vendor
aggregators that list other providers' models (OpenRouter-style slugs).
They were being matched in Step 1 before OpenRouter. Added all three
to _AGGREGATORS set so they're skipped like nous/openrouter.
Group E (1 test): model_flow_custom StopIteration
- test_model_flow_custom_saves_verified_v1_base_url
- Root cause: 'Display name' prompt was added after the test was written.
The input iterator had 5 answers but the flow now asks 6 questions.
Added 6th empty string answer.
Group F (1 test): Telegram proxy env assertion
- test_uses_proxy_env_for_primary_and_fallback_transports
- Root cause: _resolve_proxy_url() now checks TELEGRAM_PROXY first
(via resolve_proxy_url('TELEGRAM_PROXY')). Test didn't clear this
env var, allowing potential leakage from other tests in xdist workers.
Added TELEGRAM_PROXY to the cleanup list.
config.yaml terminal.cwd is now the single source of truth for working
directory. MESSAGING_CWD and TERMINAL_CWD in .env are deprecated with a
migration warning.
Changes:
1. config.py: Remove MESSAGING_CWD from OPTIONAL_ENV_VARS (setup wizard
no longer prompts for it). Add warn_deprecated_cwd_env_vars() that
prints a migration hint when deprecated env vars are detected.
2. gateway/run.py: Replace all MESSAGING_CWD reads with TERMINAL_CWD
(which is bridged from config.yaml terminal.cwd). MESSAGING_CWD is
still accepted as a backward-compat fallback with deprecation warning.
Config bridge skips cwd placeholder values so they don't clobber
the resolved TERMINAL_CWD.
3. cli.py: Guard against lazy-import clobbering — when cli.py is
imported lazily during gateway runtime (via delegate_tool), don't
let load_cli_config() overwrite an already-resolved TERMINAL_CWD
with os.getcwd() of the service's working directory. (#10817)
4. hermes_cli/main.py: Add 'hermes memory reset' command with
--target all/memory/user and --yes flags. Profile-scoped via
HERMES_HOME.
Migration path for users with .env settings:
Remove MESSAGING_CWD / TERMINAL_CWD from .env
Add to config.yaml:
terminal:
cwd: /your/project/path
Addresses: #10225, #4672, #10817, #7663
The gateway compression notifications were already removed in commit cc63b2d1
(PR #4139), but the agent-level context pressure warnings (85%/95% tiered
alerts via _emit_context_pressure) were still firing on both CLI and gateway.
Removed:
- _emit_context_pressure method and all call sites in run_conversation()
- Class-level dedup state (_context_pressure_last_warned, _CONTEXT_PRESSURE_COOLDOWN)
- Instance attribute _context_pressure_warned_at
- Pressure reset logic in _compress_context
- format_context_pressure and format_context_pressure_gateway from agent/display.py
- Orphaned ANSI constants that only served these functions
- tests/run_agent/test_context_pressure.py (all 361 lines)
Compression itself continues to run silently in the background.
Closes#3784
- Extract duplicated activity-callback polling into shared
touch_activity_if_due() helper in tools/environments/base.py
- Use helper from both base.py _wait_for_process and
code_execution_tool.py local polling loop (DRY)
- Add test assertion that timeout output field contains the
timeout message and emoji (#10807)
- Add stream_consumer test for tool-boundary fallback scenario
where continuation is empty but final_text differs from
visible prefix (#10807)
display: null or display: <non-dict> in config.yaml crashed skin init
with AttributeError. Now falls back to default skin gracefully.
Cherry-picked from #10867 by @Bartok9. Consolidates #10876 by @cola-runner.
Co-authored-by: cola-runner <cola-runner@users.noreply.github.com>
Each top-level Slack DM now gets its own Hermes session, matching the
per-thread behavior channels already have. Previously all top-level DM
messages shared one continuous session because thread_ts was None,
causing context to accumulate across unrelated conversations.
The behavior is controlled by platforms.slack.extra.dm_top_level_threads_as_sessions
in config.yaml (default: true). Set to false to restore legacy behavior.
Based on PR #10789 by helix4u. Changes from original:
- Default flipped to true (was opt-in, now opt-out)
- Removed env var fallback (config.yaml only per project policy)
- Tests updated to cover both default and opt-out paths
Wraps provider.create_session() in _get_session_info() with try/except
to catch cloud provider runtime failures (timeouts, auth errors, rate
limits, invalid responses). Falls back to _create_local_session() so
browser automation continues working when cloud APIs are down.
Marks fallback sessions with fallback_from_cloud, fallback_reason, and
fallback_provider metadata for observability. If both cloud and local
fail, raises RuntimeError with chained context from both errors.
Closes#10883
Co-authored-by: konsisumer <konsisumer@users.noreply.github.com>
Camofox automatically maps each userId to a persistent Firefox profile
on the server side — no CAMOFOX_PROFILE_DIR env var exists. Our docs
incorrectly told users to configure this on the server.
Removed the fabricated env var from:
- browser docs (:::note block)
- config.py DEFAULT_CONFIG comment
- test docstring
Three targeted fixes for the 'agent stuck on terminal command' report:
1. **Concurrent tool wait loop now checks interrupts** (run_agent.py)
The sequential path checked _interrupt_requested before each tool call,
but the concurrent path's wait loop just blocked with 30s timeouts.
Now polls every 5s and cancels pending futures on interrupt, giving
already-running tools 3s to notice the per-thread interrupt signal.
2. **Cancelled concurrent tools get proper interrupt messages** (run_agent.py)
When a concurrent tool is cancelled or didn't return a result due to
interrupt, the tool result message says 'skipped due to user interrupt'
instead of a generic error.
3. **Typing indicator fires before follow-up turn** (gateway/run.py)
After an interrupt is acknowledged and the pending message dequeued,
the gateway now sends a typing indicator before starting the recursive
_run_agent call. This gives the user immediate visual feedback that
the system is processing their new message (closing the perceived
'dead air' gap between the interrupt ack and the response).
Reported by @_SushantSays.
Fixes 12 CI test failures:
1. test_cli_new_session (4): _FakeAgent missing commit_memory_session
attribute added in the memory provider refactoring. Added MagicMock.
2. test_run_progress_topics (1): already_sent detection only checked
stream consumer flags, missing the response_previewed path from
interim_assistant_callback. Restructured guard to check both paths.
3. test_timezone (1): HERMES_TIMEZONE leaked into child processes via
_SAFE_ENV_PREFIXES matching HERMES_*. The code correctly converts
it to TZ but didn't remove the original. Added child_env.pop().
4. test_session_env (1): contextvars baseline captured from a different
context couldn't be restored after clear. Changed assertion to verify
the test's value was removed rather than comparing to a fragile baseline.
5. test_discord_slash_commands (5): already fixed on current main.
When a user enters a local model server URL (Ollama, vLLM, llama.cpp)
without a /v1 suffix during 'hermes model' custom endpoint setup,
prompt them to add it. Most OpenAI-compatible local servers require
/v1 in the base URL for chat completions to work.
When a custom/Ollama provider is used and reasoning_effort is set to 'none'
(or enabled: false), inject 'think': false into the request extra_body.
Ollama does not recognise the OpenRouter-style 'reasoning' extra_body field,
so thinking-capable models (Qwen3, etc.) generate <think> blocks regardless
of the reasoning_effort setting. This produces empty-response errors that
corrupt session state.
The fix adds a provider-specific block in _build_api_kwargs() that sets
think=false in extra_body whenever self.provider == 'custom' and reasoning
is explicitly disabled.
Closes#3191
Gateway executor work now inherits the active session contextvars via
copy_context() so background process watchers retain the correct
platform/chat/user/session metadata for routing completion events back
to the originating chat.
Cherry-picked from #10647 by @helix4u with:
- Use asyncio.get_running_loop() instead of deprecated get_event_loop()
- Strip trailing whitespace
- Add *args forwarding test
- Add exception propagation test
Recomputes GitHub Copilot api_mode from the selected model in the
shared /model switch path. Before this change, Copilot could carry a
stale codex_responses mode forward from a GPT-5 selection into a later
Claude model switch, causing unsupported_api_for_model errors.
Cherry-picked from #10533 by @helix4u with:
- Comment specificity (Provider-specific → Copilot api_mode override)
- Fix pre-existing duplicate opencode-go in set literal
- Extract test mock helper to reduce duplication
- Add GPT-5 → GPT-5 regression test (keeps codex_responses)
In Telegram forum-enabled groups, the General topic does not include
message_thread_id in incoming messages (it is None). This caused:
1. Messages in General losing thread context — replies went to wrong place
2. Typing indicator failing because thread_id=1 was rejected by Telegram
Fix: synthesize thread_id="1" for forum groups when message_thread_id
is None, then handle it correctly per operation:
- send: omit message_thread_id (Telegram rejects thread_id=1 for sends)
- typing: pass thread_id=1, retry without it on "thread not found"
Also centralizes thread_id extraction into _metadata_thread_id() across
all send methods (send, send_voice, send_image, send_document, send_video,
send_animation, send_photo), replacing ~10 duplicate patterns.
Salvaged from PR #7892 by @corazzione.
Closes#7877, closes#7519.
Expands the plugin interface so slash command handlers can dispatch tool
calls through the registry with parent agent context wired up automatically.
This is the public API for plugins that need to orchestrate tools like
delegate_task — they call ctx.dispatch_tool() instead of reaching into
framework internals. The parent agent is resolved lazily from _cli_ref
when available (CLI mode) and omitted in gateway mode (tools degrade
gracefully).
Enables the hermes-deliver-plugin pattern where /deliver and /fanout
slash commands spawn subagents via delegate_task without touching the
agent conversation loop.
7 new tests covering: registry delegation, parent_agent injection from
cli_ref, gateway mode (no cli_ref), uninitialized agent, explicit
parent_agent override, kwargs forwarding, return value passthrough.
Pass platform_env_var="TELEGRAM_PROXY" to resolve_proxy_url() in both
telegram.py (main connect) and telegram_network.py (fallback transport),
so a Telegram-specific proxy takes priority over the generic HTTPS_PROXY.
Also bridge telegram.proxy_url from config.yaml to the TELEGRAM_PROXY
env var (env var takes precedence if both are set), add OPTIONAL_ENV_VARS
entry, docs, and tests.
Composite salvage of four community PRs:
- Core approach (both call sites): #9414 by @leeyang1990
- config.yaml bridging + docs: #6530 by @WhiteWorld
- Naming convention: #9074 by @brantzh6
- Earlier proxy work: #7786 by @ten-ltw
Closes#9414, closes#9074, closes#7786, closes#6530
Co-authored-by: WhiteWorld <WhiteWorld@users.noreply.github.com>
Co-authored-by: brantzh6 <brantzh6@users.noreply.github.com>
Co-authored-by: ten-ltw <ten-ltw@users.noreply.github.com>
Salvaged from PR #10643 by kshitijk4poor, updated for current main.
Root causes fixed:
1. Telegram xdist mock pollution — new tests/gateway/conftest.py with shared
mock that runs at collection time (prevents ChatType=None caching)
2. VIRTUAL_ENV env var leak — monkeypatch.delenv in _detect_venv_dir tests
3. Copilot base_url missing — add fallback in _resolve_runtime_from_pool_entry
4. Stale vision model assertion — zai now uses glm-5v-turbo
5. Reasoning item id intentionally stripped — assert 'id' not in (store=False)
6. Context length warning unreachable — pass base_url to AIAgent in test
7. Kimi provider label updated — 'Kimi / Kimi Coding Plan' matches models.py
8. Google Workspace calendar tests — rewritten for current production code,
properly mock subprocess on api_module, removed stale +agenda assertions
9. Credential pool auto-seeding — mock _select_pool_entry / _resolve_auto /
_import_codex_cli_tokens to prevent real credentials from leaking into tests
Complete the half-built plugin slash command system. The dispatch
code in cli.py and gateway/run.py already called
get_plugin_command_handler() but the registration side was never
implemented.
Changes:
- Add register_command() to PluginContext — stores handler,
description, and plugin name; normalizes names; rejects conflicts
with built-in commands
- Add _plugin_commands dict to PluginManager
- Add commands_registered tracking on LoadedPlugin
- Add get_plugin_command_handler() and get_plugin_commands()
module-level convenience functions
- Fix commands.py to use actual plugin description in Telegram
bot menu (was hardcoded 'Plugin command')
- Add plugin commands to SlashCommandCompleter autocomplete
- Show command count in /plugins display
- 12 new tests covering registration, conflict detection,
normalization, handler dispatch, and introspection
Closes#10495
Telegram on iOS auto-converts double hyphens (--) to em dashes (—)
or en dashes (–) via autocorrect. This breaks /model flag parsing
since parse_model_flags() only recognizes literal '--provider' and
'--global'.
When the flag isn't parsed, the entire string (e.g. 'glm-5.1 —provider zai')
gets treated as the model name and fails with 'Model names cannot
contain spaces.'
Fix: normalize Unicode dashes (U+2012-U+2015) to '--' when they
appear before flag keywords (provider, global), before flag extraction.
The existing test suite in test_model_switch_provider_routing.py
already covers all four dash variants — this commit adds the code
that makes them pass.
Replace inline Path.home() / '.hermes' / 'profiles' detection in both CLI
and gateway /profile handlers with the existing get_active_profile_name()
from hermes_cli.profiles — which already handles custom-root deployments,
standard profiles, and Docker layouts.
Fixes /profile incorrectly reporting 'default' when HERMES_HOME points to
a custom-root profile path like /opt/data/profiles/coder.
Based on PR #10484 by Xowiek.
Text-only Matrix sends should continue using the lightweight _send_matrix()
HTTP helper (~100ms). Only route through the heavy MatrixAdapter (full sync +
E2EE setup) when media files are present. Adds test verifying text-only
messages don't take the adapter path.
Matrix media delivery was silently dropped by send_message because Matrix
wasn't wired into the native adapter-backed media path. Only Telegram,
Discord, and Weixin had native media support.
Adds _send_matrix_via_adapter() which creates a MatrixAdapter instance,
connects, sends text + media via the adapter's native upload methods
(send_document, send_image_file, send_video, send_voice), then disconnects.
Also fixes a stale URL-encoding assertion in test_send_message_missing_platforms
that broke after PR #10151 added quote() to room IDs.
Cherry-picked from PR #10486 by helix4u.
Background review notifications ("💾 Skill created", "💾 Memory updated")
could race ahead of the main assistant reply in chat, making it look like
the agent stopped after creating a skill.
Gate bg-review notifications behind a threading.Event + pending queue.
Register a release callback on the adapter's _post_delivery_callbacks dict
so base.py's finally block fires it after the main response is delivered.
The queued-message path in _run_agent pops and calls the callback directly
to prevent double-fire.
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
Closes#10541
When Nous returns a 429, the retry amplification chain burns up to 9
API requests per conversation turn (3 SDK retries × 3 Hermes retries),
each counting against RPH and deepening the rate limit. With multiple
concurrent sessions (cron + gateway + auxiliary), this creates a spiral
where retries keep the limit tapped indefinitely.
New module: agent/nous_rate_guard.py
- Shared file-based rate limit state (~/.hermes/rate_limits/nous.json)
- Parses reset time from x-ratelimit-reset-requests-1h, x-ratelimit-
reset-requests, retry-after headers, or error context
- Falls back to 5-minute default cooldown if no header data
- Atomic writes (tempfile + rename) for cross-process safety
- Auto-cleanup of expired state files
run_agent.py changes:
- Top-of-retry-loop guard: when another session already recorded Nous
as rate-limited, skip the API call entirely. Try fallback provider
first, then return a clear message with the reset time.
- On 429 from Nous: record rate limit state and skip further retries
(sets retry_count = max_retries to trigger fallback path)
- On success from Nous: clear the rate limit state so other sessions
know they can resume
auxiliary_client.py changes:
- _try_nous() checks rate guard before attempting Nous in the auxiliary
fallback chain. When rate-limited, returns (None, None) so the chain
skips to the next provider instead of piling more requests onto Nous.
This eliminates three sources of amplification:
1. Hermes-level retries (saves 6 of 9 calls per turn)
2. Cross-session retries (cron + gateway all skip Nous)
3. Auxiliary fallback to Nous (compression/session_search skip too)
Includes 24 tests covering the rate guard module, header parsing,
state lifecycle, and auxiliary client integration.
Extract resolve_channel_prompt() shared helper into
gateway/platforms/base.py. Refactor Discord to use it.
Wire channel_prompts into Telegram (groups + forum topics),
Slack (channels), and Mattermost (channels).
Config bridging now applies to all platforms (not just Discord).
Added channel_prompts defaults to telegram/slack/mattermost
config sections.
Docs added to all four platform pages with platform-specific
examples (topic inheritance for Telegram, channel IDs for Slack,
etc.).
Move _ensure_discord_mock() from module level to _make_adapter() so it
doesn't poison sys.modules for other discord test files. Use
types.ModuleType instead of MagicMock for the mock module to avoid
auto-generated __file__ attribute confusing hasattr checks.
Add BrennerSpear to AUTHOR_MAP.
- Remove double str() normalization in _resolve_channel_prompt since
config bridging already handles numeric YAML key conversion
- Remove dead prompts.get(str(key)) fallback that could never match
after keys were already normalized to strings
- Replace getattr(event, "channel_prompt", None) with direct attribute
access since channel_prompt is a declared dataclass field
- Update test to verify normalization responsibility lives in config bridging
When proxy env vars (HTTP_PROXY, HTTPS_PROXY, ALL_PROXY) contain
malformed URLs — e.g. 'http://127.0.0.1:6153export' from a broken
shell config — the OpenAI/httpx client throws a cryptic 'Invalid port'
error that doesn't identify the offending variable.
Add _validate_proxy_env_urls() and _validate_base_url() in
auxiliary_client.py, called from resolve_provider_client() and
_create_openai_client() to fail fast with a clear, actionable error
message naming the broken env var or URL.
Closes#6360
Co-authored-by: MestreY0d4-Uninter <MestreY0d4-Uninter@users.noreply.github.com>
Found via trace data audit: JWT tokens (eyJ...) and Discord snowflake
mentions (<@ID>) were passing through unredacted.
JWT pattern: matches 1/2/3-part tokens starting with eyJ (base64 for '{').
Zero false-positive risk — no normal text matches eyJ + 10+ base64url chars.
Discord pattern: matches <@digits> and <@!digits> with 17-20 digit snowflake
IDs. Syntactically unique to Discord's mention format.
Both patterns follow the same structural-uniqueness standard as existing
prefix patterns (sk-, ghp_, AKIA, etc.).
_parse_session_key() blindly assigned parts[5] as thread_id for all
chat types. For group sessions with per-user isolation, parts[5] is
a user_id, not a thread_id. This could cause shutdown notifications
to route with incorrect thread metadata.
Only return thread_id for chat types where the 6th element is
unambiguous: dm and thread. For group/channel sessions, omit
thread_id since the suffix may be a user_id.
Based on the approach from PR #9938 by @Ruzzgar.
OpenCode Go does not expose a shared /models endpoint, so the doctor
probe was always failing and producing a false warning. Set the default
URL to None and disable the health check for this provider.
Both /queue and /quit registered 'q' as an alias. Since /quit appeared
later in COMMAND_REGISTRY, _build_command_lookup() silently overwrote
/queue's claim, making the documented /queue shorthand unusable.
Fix: remove 'q' from /quit's aliases. /quit already has 'exit' as an
alias plus the full '/quit' command. /queue has no other short alias.
Closes#10467
The recovery block previously only retried (continue) when one of the
per-component sanitization checks (messages, tools, system prompt,
headers, credentials) found and stripped non-ASCII content. When the
non-ASCII lived only in api_messages' reasoning_content field (which
is built from messages['reasoning'] and not checked by the original
_sanitize_messages_non_ascii), all checks returned False and the
recovery fell through to the normal error path — burning a retry
attempt despite _force_ascii_payload being set.
Now the recovery always continues (retries) when _is_ascii_codec is
detected. The _force_ascii_payload flag guarantees the next iteration
runs _sanitize_structure_non_ascii(api_kwargs) on the full API payload,
catching any remaining non-ASCII regardless of where it lives.
Also adds test for the 'reasoning' field on canonical messages.
Fixes#6843
The ASCII-locale recovery path in run_agent.py sanitized the canonical
'messages' list but left 'api_messages' untouched. api_messages is a
separate API-copy built before the retry loop and may carry extra fields
(reasoning_content, extra_body entries) that are not present in
'messages'. This caused the retry to still raise UnicodeEncodeError even
after the 'System encoding is ASCII — stripped...' log line appeared.
Two changes:
- _sanitize_messages_non_ascii now walks all extra top-level string fields
in each message dict (any key not in {content, name, tool_calls, role})
so reasoning_content and future extras are cleaned in both 'messages'
and 'api_messages'.
- The ASCII-codec recovery block now also calls sanitize on api_messages
and api_kwargs so no non-ASCII survives into the next retry attempt.
Adds regression tests covering:
- reasoning_content with non-ASCII in api_messages
- extra_body with non-ASCII in api_kwargs
- canonical messages clean but api_messages dirty
Fixes#6843
After clear_session_vars() reset contextvars to their default (''),
get_session_env() treated the empty string as falsy and fell through
to os.environ — resurrecting stale HERMES_SESSION_* values from CLI
startup, cron, or previous sessions. This broke session isolation
in the gateway where concurrent messages could see each other's
stale environment values.
Fix: use a sentinel (_UNSET) as the contextvar default instead of ''.
get_session_env() now checks 'value is not _UNSET' instead of
truthiness. Three states are cleanly distinguished:
- _UNSET (never set): fall back to os.environ (CLI/cron compat)
- '' (explicitly cleared): return '' — no os.environ fallback
- 'telegram' (actively set): return the value
clear_session_vars() now uses var.set('') instead of var.reset(token)
to mark vars as explicitly cleared rather than reverting to _UNSET.
Closes#10304
When a model (e.g. mimo-v2-pro) streams intermediate text alongside tool
calls ("Let me search for that") but then returns empty after processing
tool results, the stream consumer already_sent flag is True from the
earlier text delivery. The gateway suppression check
(already_sent=True, failed=False → return None) would swallow the final
response, leaving the user staring at silence after the search.
Two changes:
1. gateway/run.py return path: skip already_sent suppression when the
final_response is "(empty)" or empty — the user needs to know the
agent finished even if streaming sent partial content earlier.
2. gateway/run.py response handler: convert the internal "(empty)"
sentinel to a user-friendly warning instead of delivering the raw
sentinel string.
Tests added for all empty/None/sentinel cases plus preserved existing
suppression behavior for normal non-empty responses.
Memory provider discovery (discover_memory_providers, load_memory_provider)
only scanned the bundled plugins/memory/ directory. User-installed providers
at $HERMES_HOME/plugins/<name>/ were invisible, forcing users to symlink
into the repo source tree — which broke on hermes update and created a
dual-registration path causing duplicate tool names (400 errors on strict
providers like Xiaomi MiMo).
Changes:
- Add _get_user_plugins_dir(), _is_memory_provider_dir(), _iter_provider_dirs(),
and find_provider_dir() helpers to plugins/memory/__init__.py
- discover_memory_providers() now scans both bundled and user dirs
- load_memory_provider() uses find_provider_dir() (bundled-first)
- discover_plugin_cli_commands() uses find_provider_dir()
- _install_dependencies() in memory_setup.py uses find_provider_dir()
- User plugins use _hermes_user_memory namespace to avoid sys.modules collisions
- Non-memory user plugins filtered via source text heuristic
- Bundled providers always take precedence on name collisions
Fixes#4956, #9099. Supersedes #4987, #9123, #9130, #9132, #9982.