Address Copilot review findings:
1. Gate _last_activity_desc on interrupt_depth == 0 alongside _last_activity_ts.
Both fields are semantically paired — desc describes the activity *at* ts.
Updating desc without ts made get_activity_summary() report "starting new
turn (cached)" for 20+ minutes while the timestamp showed the true stale
duration, producing misleading diagnostic output.
2. Monkeypatch gateway.run.time.time to a fixed epoch in tests that assert
on _last_activity_ts values. Real time.time() comparisons were latently
flaky under slow CI or NTP adjustments. _FAKE_NOW = 10_000.0 is used
as the reference; assertions are now exact equality rather than >=.
3. Add test_fresh_turn_resets_desc and test_interrupt_turn_preserves_desc to
directly cover the gated desc behaviour introduced by (1).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_last_activity_ts was unconditionally reset to time.time() on every
_agent_cache hit. For interrupt-recursive _run_agent calls
(_interrupt_depth > 0) this silently reset the inactivity watchdog's
idle clock on each re-entry, preventing the 30-min timeout from ever
firing when a turn got stuck in an interrupt loop. A stuck session
would emit "Still working... iteration 0/60, starting new turn (cached)"
heartbeats indefinitely instead of timing out.
Gate the reset on _interrupt_depth == 0 only. Fresh external turns
still receive the reset so a session idle for 29 min doesn't trip the
watchdog before the new turn makes its first API call (#9051).
The per-turn reset logic is extracted into a static helper
_init_cached_agent_for_turn() to make it directly testable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to cherry-picked PR #15920:
- agent/credential_pool.py: hoist 'from hermes_cli.config import get_env_value'
to module top instead of inline try/except in each seed site (3 sites).
No import cycle — hermes_cli/config.py doesn't depend on agent.credential_pool.
- hermes_cli/auth.py: same hoist for the _resolve_api_key_provider_secret loop.
- tests/tools/test_credential_pool_env_fallback.py: replace smoke-only tests
with real .env file I/O. Each test writes a temp ~/.hermes/.env, verifies
_seed_from_env / _resolve_api_key_provider_secret read from it, and asserts
the full priority chain: os.environ > .env > credential_pool. Uses
'deepseek' as the test provider since 'openai' isn't in PROVIDER_REGISTRY
and _seed_from_env's generic path requires a real pconfig lookup.
New `hermes kanban` CLI subcommand + `/kanban` slash command + skills for
worker and orchestrator profiles. SQLite-backed task board
(~/.hermes/kanban.db) shared across all profiles on the host. Zero
changes to run_agent.py, no new core tools, no tool-schema bloat.
Motivation: delegate_task is a function call — sync fork/join, anonymous
subagent, no resumability, no human-in-the-loop. Kanban is the durable
shape needed for research triage, scheduled ops, digital twins,
engineering pipelines, and fleet work. They coexist (workers may call
delegate_task internally).
What this adds
- hermes_cli/kanban_db.py — schema, CAS claim, dependency resolution,
dispatcher, workspace resolution, worker-context builder.
- hermes_cli/kanban.py — 15-verb CLI surface and shared run_slash()
entry point used by both CLI and gateway.
- skills/devops/kanban-worker — how a profile should work a claimed task.
- skills/devops/kanban-orchestrator — "you are a dispatcher, not a
worker" template with anti-temptation rules.
- /kanban slash command wired into cli.py and gateway/run.py. Bypasses
the running-agent guard (board writes don't touch agent state), so
/kanban unblock can free a stuck worker mid-conversation.
- Design spec at docs/hermes-kanban-v1-spec.pdf — comparative analysis
vs Cline Kanban, Paperclip, NanoClaw, Gemini Enterprise; 8 patterns;
4 user stories; implementation plan; concurrency correctness.
- Docs: website/docs/user-guide/features/kanban.md, CLI reference
updated, sidebar entry added.
Architecture highlights
- Three planes: control (user + gateway), state (board + dispatcher),
execution (pool of profile processes).
- Every worker is a full OS process, spawned as `hermes -p <profile>`.
No in-process subagent swarms — solves NanoClaw's SDK-lifecycle
failure class.
- Atomic claim via SQLite CAS in a BEGIN IMMEDIATE transaction; stale
claims reclaimed 15 min after their TTL expires.
- Tenant namespacing via one nullable column — one specialist fleet
can serve many businesses with data isolation by workspace path.
Tests: 60 targeted tests (schema, CAS atomicity, dependency resolution,
dispatcher, workspace kinds, tenancy, CLI + slash surface). All pass
hermetic via scripts/run_tests.sh.
Follow-up to PR #16053 (/btw as /background alias). Cleans up the
plumbing added exclusively for the old ephemeral /btw handler and
repairs a broken btw bypass that landed between my refactor and this
follow-up.
run_agent.py:
- Remove persist_session kwarg, instance attr, and _persist_session
short-circuit. Only /btw ever passed persist_session=False; with
/btw gone the default (always persist) is the only behavior anyone
ever wanted.
gateway/run.py:
- Remove the unreachable 'if _cmd_def_inner.name == "btw"' block
(PR #16059). Canonical name for a /btw message is 'background' after
alias resolution — the comparison could never be true, and it called
_handle_btw_command which no longer exists. The /background branch
above it already dispatches /btw correctly.
tests/gateway/test_running_agent_session_toggles.py:
- Fix test_btw_dispatches_mid_run to mock _handle_background_command
(the real dispatch target for /btw) instead of the deleted
_handle_btw_command.
/btw spawns a parallel ephemeral side-question task (self-guarded against
concurrent /btw on the same chat) — exactly like /background. But it was
missing from the running-agent bypass list in _handle_message(), so it
fell through to the catch-all and returned:
⏳ Agent is running — /btw can't run mid-turn. Wait for the current
response or /stop first.
That's the opposite of what /btw is for — asking a side question while
the main turn is still working. Add the bypass next to /background and a
regression test covering the mid-turn dispatch path.
Reported by @IuriiTiunov on Telegram.
PR #16046 added /busy and /verbose hints to the classic CLI and the
gateway runner but skipped the Ink TUI (and therefore the dashboard
/chat page, which embeds the TUI via PTY). This extends the same
latch to the TUI with TUI-native wording.
The TUI's busy-input model is not the /busy knob from the CLI —
single Enter while busy auto-queues, double Enter on an empty line
interrupts. The new busy-input hint teaches THAT gesture instead of
telling the user to flip a config that does not apply.
Changes:
- agent/onboarding.py — add busy_input_hint_tui() + tool_progress_hint_tui()
- tui_gateway/server.py — onboarding.claim JSON-RPC (Ink triggers busy
hint on enqueue) + _maybe_emit_onboarding_hint helper hooked into
_on_tool_complete for the 30s/tool_progress=all path. Same
config.yaml latch so each hint fires at most once per install across
CLI, gateway, and TUI combined.
- ui-tui/src/gatewayTypes.ts — OnboardingClaimResponse + onboarding.hint event
- ui-tui/src/app/createGatewayEventHandler.ts — render the hint event as sys()
- ui-tui/src/app/useSubmission.ts — claim busy_input_prompt on first
busy enqueue
- tests/agent/test_onboarding.py — +3 cases for TUI hint shape
- tests/tui_gateway/test_protocol.py — +4 cases for onboarding.claim
- website/docs/user-guide/tui.md — new 'Interrupting and queueing'
section explaining the TUI's double-Enter model and the hints
Validation:
scripts/run_tests.sh tests/agent/test_onboarding.py \
tests/tui_gateway/test_protocol.py \
tests/gateway/test_busy_session_ack.py
-> 66 passed
npm --prefix ui-tui run type-check -> clean
npm --prefix ui-tui run lint -> clean
npm --prefix ui-tui run build -> clean
Manage the fallback_providers chain from the CLI instead of hand-editing
config.yaml. The picker reuses select_provider_and_model() from 'hermes
model' — same provider list, same credential prompts, same model picker.
hermes fallback [list] Show the current chain (primary + fallbacks)
hermes fallback add Run the model picker, append selection to chain
hermes fallback remove Pick an entry to delete (arrow-key menu)
hermes fallback clear Remove all entries (with confirmation)
'add' snapshots config['model'] before calling the picker, extracts the
user's selection from the post-picker state, then restores the primary
and appends {provider, model, base_url?, api_mode?} to fallback_providers.
Auth store's active_provider is snapshot/restored too so OAuth-provider
fallbacks don't silently deactivate the user's primary. Duplicates and
self-as-fallback are rejected. Legacy single-dict 'fallback_model' entries
are auto-migrated to the list format on first write.
Instead of a blocking first-run questionnaire, show a one-time hint the first
time the user hits each behavior fork:
1. First message while the agent is working — appends a hint to the busy-ack
explaining the /busy queue vs /busy interrupt knob, phrased to match the
mode that was just applied (don't tell a queue-mode user to switch to
queue).
2. First tool that runs for >= 30s in the noisiest progress mode
(tool_progress: all) — prints a hint about /verbose to cycle display
modes (all -> new -> off -> verbose). Gated on /verbose actually being
usable on the surface: always shown on CLI; on gateway only shown when
display.tool_progress_command is enabled.
Each hint is latched in config.yaml under onboarding.seen.<flag>, so it
fires exactly once per install across CLI, gateway, and cron, then never
again. Users can wipe the section to re-see hints.
New:
- agent/onboarding.py — is_seen / mark_seen / hint strings, shared by
both CLI and gateway.
- onboarding.seen in DEFAULT_CONFIG (hermes_cli/config.py) and in
load_cli_config defaults (cli.py). No _config_version bump — deep
merge handles new keys.
Wired:
- gateway/run.py: _handle_active_session_busy_message appends the hint
after building the ack. progress_callback tracks tool.completed
duration and queues the tool-progress hint into the progress bubble.
- cli.py: CLI input loop appends the busy-input hint on the first busy
Enter; _on_tool_progress appends the tool-progress hint on the first
>=30s tool completion. In-memory CLI_CONFIG is also updated so
subsequent fires in the same process are suppressed immediately.
All writes go through atomic_yaml_write and are wrapped in try/except
so onboarding can never break the input/busy-ack paths.
The base adapter's auto-TTS path fired on any voice message unless the
chat had explicitly run /voice off — it never read voice.auto_tts from
config.yaml, so users who set auto_tts: false still got audio replies.
Gate the base adapter on a three-layer decision instead:
1. chat in _auto_tts_enabled_chats (explicit /voice on|tts) → fire
2. chat in _auto_tts_disabled_chats (explicit /voice off) → suppress
3. else → voice.auto_tts global default
Runner now pushes voice.auto_tts onto the adapter as _auto_tts_default
and mirrors /voice on|tts chats into _auto_tts_enabled_chats via the
existing _sync_voice_mode_state_to_adapter path. /voice off still wins.
Closes#16007.
When the LLM response carries N parallel tool calls, the agent fires
N tool.started events back-to-back before its interrupt check runs.
A user sending /stop mid-batch would see the '⚡ Interrupting current
task' ack followed by a trail of 🔍 web_search bubbles for the remaining
events in the batch — making the interrupt feel ignored.
progress_callback and the drain loop in send_progress_messages now
check agent.is_interrupted (via agent_holder[0], the existing
cross-scope handle). Events that arrive after interrupt are dropped
at both the queueing and rendering stages. The '⚡ Interrupting'
message is sent through a separate adapter path and is unaffected.
OpenRouter and Nous Portal curated picker lists now resolve via a JSON
manifest served by the docs site, falling back to the in-repo snapshot
when unreachable. Lets us update model lists without shipping a release.
Live URL: https://hermes-agent.nousresearch.com/docs/api/model-catalog.json
(source at website/static/api/model-catalog.json; auto-deploys via the
existing deploy-site.yml GitHub Pages pipeline on every merge to main).
Schema (v1) carries id + optional description + free-form metadata at
manifest, provider, and model levels. Pricing and context length stay
live-fetched via existing machinery (/v1/models endpoints, models.dev).
Config (new model_catalog section, default enabled):
model_catalog.url master manifest URL
model_catalog.ttl_hours disk cache TTL (default 24h)
model_catalog.providers.<name>.url optional per-provider override
Fetch pipeline: in-process cache -> disk cache (fresh < TTL) -> HTTP
fetch -> disk-cache-on-failure fallback -> in-repo snapshot as last
resort. Never raises to callers; at worst returns the bundled list.
Changes:
- website/static/api/model-catalog.json initial manifest (35 OR + 31 Nous)
- scripts/build_model_catalog.py regenerator from in-repo lists
- hermes_cli/model_catalog.py fetch + validate + cache module
- hermes_cli/models.py fetch_openrouter_models() +
new get_curated_nous_model_ids()
- hermes_cli/main.py, hermes_cli/auth.py Nous flows use the helper
- hermes_cli/config.py model_catalog defaults
- website/docs/reference/model-catalog.md + sidebars.ts
- tests/hermes_cli/test_model_catalog.py 21 tests (validation, fetch
success/failure, accessors,
disabled, overrides, integration)
Stop pre-stripping the path from the configured MCP server URL before
constructing OAuthClientProvider. The MCP SDK strips the path itself via
OAuthContext.get_authorization_base_url() for authorization-server
discovery, but uses the full server_url through
resource_url_from_server_url() + check_resource_allowed() to validate
against the server's RFC 9728 Protected Resource Metadata.
For servers whose PRM advertises a path-scoped resource (e.g. Notion's
https://mcp.notion.com/mcp), our _parse_base_url() collapsed the URL to
the origin, so check_resource_allowed() saw requested='/' vs
configured='/mcp/' and refused the token. Fixes OAuth against Notion MCP
(and any other path-scoped resource).
Closes#16015.
`_apply_model_switch_result` (the interactive `/model` picker's
confirmation path) printed `ModelInfo.context_window` straight from
models.dev, which reports the vendor-wide value (1.05M for gpt-5.5 on
openai). ChatGPT Codex OAuth caps the same slug at 272K, so the picker
showed 1M while the runtime (compressor, gateway `/model`, typed
`/model <name>`) correctly used 272K — the classic 'sometimes 1M,
sometimes 272K' mismatch on a single model.
Both display paths now go through `resolve_display_context_length()`,
matching the fix that `_handle_model_switch` received earlier.
Also bump the stale last-resort fallback in DEFAULT_CONTEXT_LENGTHS
(`gpt-5.5: 400000 -> 1050000`) to match the real OpenAI API value; the
272K Codex cap is already enforced via the Codex-OAuth branch, so the
fallback now reflects what every non-Codex probe-miss should see.
Tests: adds `test_apply_model_switch_result_context.py` with three
scenarios (Codex cap wins, OpenRouter shows 1.05M, resolver-empty falls
back to ModelInfo). Updates the existing non-Codex fallback test to
assert 1.05M (the correct value).
## Validation
| path | before | after |
|-------------------------------|-----------|-----------|
| picker -> gpt-5.5 on Codex | 1,050,000 | 272,000 |
| picker -> gpt-5.5 on OpenAI | 1,050,000 | 1,050,000 |
| picker -> gpt-5.5 on OpenRouter | 1,050,000 | 1,050,000 |
| typed /model gpt-5.5 on Codex | 272,000 | 272,000 |
#14934 added deepseek-v4-pro / deepseek-v4-flash to the DeepSeek native
provider but the context-window lookup still falls back to the existing
"deepseek" substring entry (128K). DeepSeek V4 ships with a 1M context
window, so any caller relying on get_model_context_length() for
pre-flight token budgeting (compression, context warnings) under-counts
by ~8x.
Add explicit lowercase entries for the four DeepSeek model ids that
ship 1M context:
- deepseek-v4-pro
- deepseek-v4-flash
- deepseek-chat (legacy alias, server-side maps to v4-flash non-thinking)
- deepseek-reasoner (legacy alias, server-side maps to v4-flash thinking)
Longest-key-first substring matching means these explicit entries also
cover the vendor-prefixed forms (deepseek/deepseek-v4-pro on OpenRouter
and Nous Portal) without regressing the existing 128K fallback for
older / unknown DeepSeek model ids on custom endpoints.
Source: https://api-docs.deepseek.com/zh-cn/quick_start/pricing
The background skill-review prompt (spawned after N user turns) now instructs
the reviewer to SURVEY existing skills first, identify the CLASS of task, and
PREFER updating/generalizing an existing skill over creating a new narrow one.
This reduces near-duplicate skill accumulation at the source. Catches the
common failure mode where repeated tasks of the same class each spawn their
own specific skill ("fix-my-tauri-error", "fix-my-electron-error") instead
of a single class-level skill ("desktop-app-build-troubleshooting").
Applied to both _SKILL_REVIEW_PROMPT and the **Skills** half of
_COMBINED_REVIEW_PROMPT. Memory-only review prompt unchanged.
Groundwork for the Curator feature (issue #7816) — the creation-side fix.
Curator handles the retirement/consolidation side in a follow-up PR.
Tests assert the behavioral instructions are present (survey, class, update-
over-create, overlap-flagging, opt-out clause) rather than snapshotting the
full prompt text.
Nous Portal multiplexes multiple upstream providers (DeepSeek, Kimi,
MiMo, Hermes) behind one endpoint. Before this fix, any 429 on any of
those models recorded a cross-session file breaker that blocked EVERY
model on Nous for the cooldown window -- even though the caller's
own RPM/RPH/TPM/TPH buckets were healthy. Users hit a DeepSeek V4 Pro
capacity error, restarted, switched to Kimi 2.6, and still got
'Nous Portal rate limit active -- resets in 46m 53s'.
Nous already emits the full x-ratelimit-* header suite on every
response (captured by rate_limit_tracker into agent._rate_limit_state).
We now gate the breaker on that data: trip it only when either the
429's own headers or the last-known-good state show a bucket with
remaining == 0 AND a reset window >= 60s. Upstream-capacity 429s
(healthy buckets everywhere, but upstream out of capacity) fall
through to normal retry/fallback and the breaker is never written.
Note: the in-memory 'restart TUI/gateway to clear' workaround
circulated in Discord does NOT work -- the breaker is file-backed at
~/.hermes/rate_limits/nous.json. The workaround for users still
affected by a bad state file is to delete it.
Reported in Discord by CrazyDok1 and KYSIV (Apr 2026).
Plugin hooks fired after a tool dispatch now receive an integer
duration_ms kwarg measuring how long the tool's registry.dispatch()
call took (time.monotonic() before/after). Inspired by Claude Code
2.1.119 which added the same field to PostToolUse hook inputs.
Wire points:
- model_tools.py: measure dispatch latency, pass duration_ms to
invoke_hook("post_tool_call", ...) and invoke_hook("transform_tool_result", ...)
- hermes_cli/hooks.py: include duration_ms in the synthetic payload
used by 'hermes hooks test' and 'hermes hooks doctor' so shell-hook
authors see the same shape at development time as runtime
- shell hooks (agent/shell_hooks.py): no code change needed;
_serialize_payload already surfaces non-top-level kwargs under
payload['extra'], so duration_ms lands at extra.duration_ms for
shell-hook scripts
Plugin authors can now build latency dashboards, per-tool SLO alerts,
and regression canaries without having to wrap every tool manually.
Test: tests/test_model_tools.py::test_post_tool_call_receives_non_negative_integer_duration_ms
E2E: real PluginManager + dispatch monkey-patched with a 50ms sleep,
hook callback observes duration_ms=50 (int).
Refs: https://code.claude.com/docs/en/changelog (2.1.119, Apr 23 2026)
Adds a floor below --yolo: a tiny set of commands so catastrophic they
should never run via the agent, regardless of --yolo, gateway /yolo,
approvals.mode=off, or cron approve mode. Opting into yolo is trusting
the agent with your files and services — not trusting it to wipe the
disk or power the box off.
The list is deliberately small (12 patterns), covering only
unrecoverable ops:
- rm -rf targeting /, /home, /etc, /usr, /var, /boot, /bin, /sbin,
/lib, ~, $HOME
- mkfs (any variant)
- dd + redirection to raw block devices (/dev/sd*, /dev/nvme*, etc.)
- fork bomb
- kill -1 / kill -9 -1
- shutdown, reboot, halt, poweroff, init 0/6, telinit 0/6,
systemctl poweroff/reboot/halt/kexec
Recoverable-but-costly commands (git reset --hard, rm -rf /tmp/x,
chmod -R 777, curl | sh) stay in DANGEROUS_PATTERNS where yolo can
still pass them through — that's what yolo is for.
Container backends (docker/singularity/modal/daytona) continue to
bypass both hardline and dangerous checks, since nothing they do can
touch the host.
Inspired by Mercury Agent's permission-hardened blocklist.
Bare `hermes setup` on a returning user now drops straight into the
full reconfigure wizard — every prompt shows the current value as its
default, press Enter to keep or type a new value to change it. The
returning-user menu is gone.
Behavior:
- First-time user: first-time wizard (unchanged)
- Returning user, bare command: full reconfigure wizard (new default)
- Returning user, `--quick`: only prompt for missing/unset items
- Returning user, one section: `hermes setup model|terminal|gateway|tools|agent`
- `--reconfigure`: preserved as backwards-compat alias (no-op since it's now default)
The section functions already used current values as prompt defaults —
this change just removes the extra click to get to them.
The 'Quick Setup - configure missing items only' menu option is now
exposed as the explicit `--quick` flag; it's the narrow case of
filling in missing config (e.g. after a partial OpenClaw migration or
when a required API key got cleared).
Inspired by Mercury Agent's `mercury doctor` UX.
Also removes:
- RETURNING_USER_MENU_SECTION_KEYS (orphaned constant)
- Two returning-user menu tests in test_setup_noninteractive.py
(guarding behavior that no longer exists — covered by
test_setup_reconfigure.py instead)
The azure-foundry wizard now probes the endpoint before asking the user
to pick anything by hand:
1. URL path sniff — endpoints ending in /anthropic are Azure Foundry
Claude routes and skip to anthropic_messages.
2. GET <base>/models probe — if the endpoint returns an OpenAI-shaped
model list, we switch to chat_completions and prefill the picker
with the returned deployment/model IDs.
3. Anthropic Messages probe — fallback for endpoints that don't expose
/models but do speak the Anthropic Messages shape.
4. Manual fallback — private endpoints / custom routes still work;
the user picks API mode + types a deployment name.
Context length for the selected model is resolved through the existing
agent.model_metadata.get_model_context_length chain (models.dev,
provider metadata, hardcoded family fallbacks) and stored in
model.context_length when a non-default value is found.
Also refactors runtime_provider so Azure Foundry resolution is reused
between the explicit-credentials path and the default top-level path —
previously the /v1 strip for Anthropic-style Azure only ran when the
caller passed explicit_* args, which meant config-driven sessions
hit a double-/v1 URL.
New module hermes_cli/azure_detect.py with 19 unit tests covering:
- path sniff, model ID extraction, probe fallbacks
- HTTP error handling (URLError, HTTPError)
- context-length lookup passthrough
- DEFAULT_FALLBACK_CONTEXT rejection
New runtime tests cover:
- OpenAI-style Azure Foundry
- Anthropic-style Azure Foundry with /v1 stripping
- Missing base_url / API key raising AuthError
Rationale: Microsoft confirms there's no pure-API-key endpoint to list
Azure deployments (that requires ARM management auth). The v1 Azure
OpenAI endpoint does expose /models with the resource's available
model catalog, which is good enough for picker prefill in the common
case. Users on private/gated endpoints fall through to manual entry.
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:
1. `_max_tokens_param()` only sent `max_completion_tokens` for
`api.openai.com` URLs. Azure also requires `max_completion_tokens`
for gpt-5.x models.
2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x
to Responses API. Azure does NOT support the Responses API — it serves
gpt-5.x on the regular `/chat/completions` path, causing a 404.
Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
`chat_completions` where Azure actually serves it.
- The fallback-provider api_mode picker also recognises Azure and stays
on chat_completions.
- Tests cover max_tokens routing, api_mode behaviour, and URL detection.
gpt-4.x models on Azure are unaffected (already used chat_completions +
max_tokens, which Azure accepts for those models).
Salvage of PR #10086 — rewritten against current main where the
codex_responses upgrade gate gained copilot-acp / explicit-api_mode
exclusions.
Fixes#15779. Custom-provider per-model context_length (`custom_providers[].models.<id>.context_length`) is now honored across every resolution path, not just agent startup. Also adds 256K as the top probe tier and default fallback.
## What changed
New helper `hermes_cli.config.get_custom_provider_context_length()` — single source of truth for the per-model override lookup, with trailing-slash-insensitive base-url matching.
`agent.model_metadata.get_model_context_length()` gains an optional `custom_providers=` kwarg (step 0b — runs after explicit `config_context_length` but before every other probe).
Wired through five call sites that previously either duplicated the lookup or ignored it entirely:
- `run_agent.py` startup — refactored to use the new helper (dedups legacy inline loop, keeps invalid-value warning)
- `AIAgent.switch_model()` — re-reads custom_providers from live config on every /model switch
- `hermes_cli.model_switch.resolve_display_context_length()` — new `custom_providers=` kwarg
- `gateway/run.py` /model confirmation (picker callback + text path)
- `gateway/run.py` `_format_session_info` (/info)
## Context probe tiers
`CONTEXT_PROBE_TIERS = [256_000, 128_000, 64_000, 32_000, 16_000, 8_000]` — was `[128_000, ...]`. `DEFAULT_FALLBACK_CONTEXT` follows tier[0], so unknown models now default to 256K. The stale `128000` literal in the OpenRouter metadata-miss path is replaced with `DEFAULT_FALLBACK_CONTEXT` for consistency.
## Repro (from #15779)
```yaml
custom_providers:
- name: my-custom-endpoint
base_url: https://example.invalid/v1
model: gpt-5.5
models:
gpt-5.5:
context_length: 1050000
```
`/model gpt-5.5 --provider custom:my-custom-endpoint` → previously "Context: 128,000", now "Context: 1,050,000".
## Tests
- `tests/hermes_cli/test_custom_provider_context_length.py` — new file, 19 tests covering the helper, step-0b integration, and the 256K tier invariants
- `tests/hermes_cli/test_model_switch_context_display.py` — added regression tests for #15779 through the display resolver
- `tests/gateway/test_session_info.py` — updated default-fallback assertion (128K → 256K)
- `tests/agent/test_model_metadata.py` — updated tier assertions for the new top tier
Add ability to interrupt a running agent via the runs API. Previously
/v1/runs could start a run and subscribe to events, but there was no
way to cancel it. The new endpoint stores agent and task references
during execution, calls agent.interrupt() to stop LLM calls, then
cancels the asyncio task.
Includes 15 tests covering start, events, and stop scenarios.
When the user interrupts a long-running task, prompt_toolkit tries to
flush stdout during emergency shutdown. If stdout is in a broken state
(redirected to /dev/null, pipe closed, terminal gone), the flush raises
`OSError: [Errno 5] Input/output error` which propagates unhandled and
crashes the CLI.
Two defense layers:
1. `_suppress_closed_loop_errors`: add `OSError` with `errno.EIO` to
the asyncio exception handler, matching the existing pattern for
`RuntimeError("Event loop is closed")` and `KeyError("is not
registered")`.
2. Outer `except (KeyError, OSError)` block: add `errno.EIO` check
before the existing string-match guards, silently suppressing the
error instead of printing a misleading stdin-related message.
Fixes#13710.
The raw-template lookup added in PR #15817 went through
`get_compatible_custom_providers(read_raw_config())`, which calls
`_normalize_custom_provider_entry` → `urlparse(base_url)`. Any
entry whose `base_url` is itself an env-ref (`${NEURALWATT_API_BASE}`)
was dropped as 'not a valid URL', so `api_key_ref` stayed empty and the
resolved secret was still written to `model.api_key` — the exact case
the original Discord report described.
Replace the normalizer-gated lookup with a direct read of
`raw['custom_providers']` and `raw['providers']`, indexed by name
(case-insensitive, optionally qualified by model) so the loaded
(expanded) entry can be matched regardless of how `base_url` is
written.
Add an integration regression test driving the real
`select_provider_and_model` entry point with the Discord-reported
NeuralWatt config (`${VAR}` in both `base_url` and `api_key`).
This test fails on the PR-only fix and passes with the broadened
lookup.
Previously _copy_reasoning_content_for_api only padded reasoning_content
when the assistant message had tool_calls. DeepSeek V4 thinking mode
requires the field on every assistant turn, including plain text replies
without tool_calls.
- Remove the 'source_msg.get("tool_calls") and' guard
- Update test: plain assistant turns now get padded for DeepSeek/Kimi
Fixes#15213
- expand short model aliases like sonnet/opus via static catalogs during startup runtime resolution
- keep startup alias resolution network-free and add regression tests in models and tui gateway suites
The Codex Responses API rejects input_text inside assistant messages —
only output_text and refusal are valid content types for assistant role.
_chat_content_to_responses_parts() previously hardcoded all text content
to input_text regardless of the message role. When an assistant message
had list-format content (multimodal or structured), this produced invalid
input_text parts that the API rejected with:
Invalid value: 'input_text'. Supported values are: 'output_text' and 'refusal'.
Fix: add a role parameter to _chat_content_to_responses_parts() that
selects output_text for assistant messages and input_text for user
messages. Thread this through _chat_messages_to_responses_input() and
_preflight_codex_input_items().
Fixes#15687
When a user sends /stop during a streaming API call, the outer poll loop
detects _interrupt_requested and closes the HTTP connection. However, the
inner _call() thread catches the connection error and enters its retry
loop — opening a FRESH connection without checking the interrupt flag.
On slow providers like ollama-cloud, each retry attempt blocks for the
full stream-read timeout (120s+). With 3 retry attempts this caused
510+ second delays between /stop and actual response — the agent appeared
completely unresponsive despite the stop being acknowledged.
Fix: add an _interrupt_requested check at the top of the streaming retry
loop so the agent exits immediately instead of retrying.
Also fix log truncation: all session key logging in gateway/run.py used
[:20] or [:30] slices, which truncated 'agent:main:telegram:dm:5690190437'
(33 chars) to 'agent:main:telegram:' — losing the identifying chat type
and user ID. Replace with full keys to make logs debuggable.
Reported by user Sidharth Pulipaka via Telegram on ollama-cloud provider.
The AIAgent.flush_memories pre-compression save, the gateway
_flush_memories_for_session, and everything feeding them are
obsolete now that the background memory/skill review handles
persistent memory extraction.
Problems with flush_memories:
- Pre-dates the background review loop. It was the only memory-save
path when introduced; the background review now fires every 10 user
turns on CLI and gateway alike, which is far more frequent than
compression or session reset ever triggered flush.
- Blocking and synchronous. Pre-compression flush ran on the live agent
before compression, blocking the user-visible response.
- Cache-breaking. Flush built a temporary conversation prefix
(system prompt + memory-only tool list) that diverged from the live
conversation's cached prefix, invalidating prompt caching. The
gateway variant spawned a fresh AIAgent with its own clean prompt
for each finalized session — still cache-breaking, just in a
different process.
- Redundant. Background review runs in the live conversation's
session context, gets the same content, writes to the same memory
store, and doesn't break the cache. Everything flush_memories
claimed to preserve is already covered.
What this removes:
- AIAgent.flush_memories() method (~248 LOC in run_agent.py)
- Pre-compression flush call in _compress_context
- flush_memories call sites in cli.py (/new + exit)
- GatewayRunner._flush_memories_for_session + _async_flush_memories
(and the 3 call sites: session expiry watcher, /new, /resume)
- 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks,
hermes tools UI task list, auxiliary_client docstrings
- _memory_flush_min_turns config + init
- #15631's headroom-deduction math in
_check_compression_model_feasibility (headroom was only needed
because flush dragged the full main-agent system prompt along;
the compression summariser sends a single user-role prompt so
new_threshold = aux_context is safe again)
- The dedicated test files and assertions that exercised
flush-specific paths
What this renames (with read-time backcompat on sessions.json):
- SessionEntry.memory_flushed -> SessionEntry.expiry_finalized.
The session-expiry watcher still uses the flag to avoid re-running
finalize/eviction on the same expired session; the new name
reflects what it now actually gates. from_dict() reads
'expiry_finalized' first, falls back to the legacy 'memory_flushed'
key so existing sessions.json files upgrade seamlessly.
Supersedes #15631 and #15638.
Tested: 383 targeted tests pass across run_agent/, agent/, cli/,
and gateway/ session-boundary suites. No behavior regressions —
background memory review continues to handle persistent memory
extraction on both CLI and gateway.
_check_compression_model_feasibility calls get_model_context_length
without provider=, so Codex OAuth users get 1,050,000 (from models.dev
for 'openai') instead of the actual 272,000 limit. This happens because
_infer_provider_from_url maps chatgpt.com → 'openai' (not 'openai-codex'),
skipping the Codex-specific resolution branch entirely.
Result: compression threshold set at 85% of 1.05M = 892K — conversations
never trigger compression, the context grows unbounded, and when gateway
hygiene eventually forces compression, the Codex endpoint drops the
oversized streaming request ('peer closed connection without sending
complete message body').
Fix: forward self.provider to get_model_context_length so provider-
specific resolution branches (Codex OAuth 272K, Copilot live /models,
Nous suffix-match) fire correctly.
Reported by user on GPT 5.5 via Codex OAuth Pro (paste.rs/vsra3).
* fix(terminal): three-layer defense against watch_patterns notification spam
Background processes that stack notify_on_complete=True with watch_patterns
can flood the user with duplicate, delayed notifications — matches deliver
asynchronously via the completion queue and continue arriving minutes after
the process has exited. The docstring warning against this (PR #12113) has
proven insufficient; agents still misuse the combination.
Three layered defenses, each sufficient on its own:
1. Mutual exclusion (terminal_tool.py): When both flags are set on a
background process, drop watch_patterns with a warning. notify_on_complete
wins because 'let me know when it's done' is the more useful signal and
fires exactly once. Extracted as _resolve_notification_flag_conflict() so
the rule is testable in isolation.
2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now
bails the moment session.exited is True. Post-exit chunks (buffered reads
draining after the process is gone) no longer produce notifications. This
is the fix flagged as future work in session 20260418_020302_79881c.
3. Global circuit breaker (process_registry.py): Per-session rate limits don't
catch the sibling-flood case — N concurrent processes can each stay under
8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap
trips a 30-second cooldown across ALL sessions, emits a single
watch_overflow_tripped event, silently counts dropped events, and emits a
watch_overflow_released summary when the cooldown ends.
Also updates the tool schema + docstring to document the new behavior.
Tests: 8 new tests covering all three fixes (suppress-after-exit x2,
mutual-exclusion resolver x4, global breaker trip/cooldown/release x2).
All 60 tests across test_watch_patterns.py, test_notify_on_complete.py,
test_terminal_tool.py pass.
Real-world trigger: self-inflicted in session 20260425_051924 — three
concurrent hermes-sweeper review subprocesses each set watch_patterns=
['failed validation', 'errored'] AND notify_on_complete=True, then iterated
over multiple items, producing enough matches per process to defeat the
per-session cap while staying under the global cap that didn't yet exist.
* fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion
Per Teknium's direction, the watch_patterns rate limit is now much more
aggressive and self-healing.
## New rule — per session
- HARD cap: 1 watch-match notification per 15 seconds per process.
- Any match arriving inside the cooldown window is dropped and counts as
ONE strike for that window (many drops in the same window still = 1 strike).
- After 3 consecutive strike windows, watch_patterns is permanently disabled
for the session and the session is auto-promoted to notify_on_complete
semantics — exactly one notification when the process actually exits.
- A cooldown window that expires with zero drops resets the consecutive
strike counter — healthy cadence is forgiven.
## Schema + docstring rewritten
The tool schema description now gives the model explicit guidance:
- notify_on_complete is 'the right choice for almost every long-running task'
- watch_patterns is for RARE one-shot signals on LONG-LIVED processes
- Do NOT use watch_patterns with loops/batch jobs — error patterns fire every
iteration and will hit the strike limit fast
- Mutual exclusion is stated on both parameter descriptions
- 1/15s cooldown and 3-strike promotion are stated in the watch_patterns
description so the model sees the contract every turn
## Removed
- WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the
new 1/15s limit subsumes both; keeping them would double-count.
- _watch_window_hits / _watch_window_start / _watch_overload_since fields
on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until
/ _watch_strike_candidate / _watch_consecutive_strikes.
## Kept
- Global circuit breaker across all sessions (15/10s → 30s cooldown) as a
secondary safety net for concurrent siblings. Still valuable when 20
short-lived processes each fire once — none individually violates the
per-session limit.
- Suppress-after-exit guard.
- Mutual exclusion resolver at the tool entry point.
## Tests
- 6 new tests in TestPerSessionRateLimit covering: first match delivers,
second in cooldown suppressed, multi-drop = single strike, 3 strikes
disables + promotes, clean window resets counter, suppressed count
carried to next emit.
- Global circuit breaker tests rewritten to use fresh sessions instead of
hacking removed per-window fields.
- 50/50 watch_patterns + notify_on_complete tests pass.
- 60/60 including test_terminal_tool.py pass.
* feat(dashboard): page-scoped plugin slots for built-in pages
Dashboard plugins can now inject components into specific built-in
pages (Sessions, Analytics, Logs, Cron, Skills, Config, Env, Docs,
Chat) without overriding the whole route.
Previously, plugins could only:
1. Add new tabs (tab.path)
2. Replace whole built-in pages (tab.override)
3. Inject into global shell slots (header-*, footer-*, pre-main, ...)
None of those let a plugin add a banner, card, or widget to an
existing page. The new <page>:top / <page>:bottom slots close that
gap, reusing the existing registerSlot() API.
Changes
- web/src/plugins/slots.ts: 18 new KNOWN_SLOT_NAMES entries
(sessions:top, sessions:bottom, analytics:top, ..., chat:bottom),
grouped under "Shell-wide" vs "Page-scoped" in the docblock
- web/src/pages/*: each built-in page now renders
<PluginSlot name="<page>:top" />
as the first child of its outer wrapper and
<PluginSlot name="<page>:bottom" />
as the last child -- zero visual cost when no plugin registers
- plugins/example-dashboard: registers a demo banner into
sessions:top via registerSlot(), with matching slots entry in
the manifest -- so freshly-setup users can see what page-scoped
slots look like without writing any plugin code
- website/docs: new "Page-scoped slots" table in the plugin
authoring guide, with a worked example
- tests/hermes_cli/test_web_server.py: round-trip test for
colon-bearing slot names (sessions:top, analytics:bottom, ...)
Validation
- npm run build: clean (tsc -b + vite build, 2761 modules)
- scripts/run_tests.sh tests/hermes_cli/test_web_server.py::TestDashboardPluginManifestExtensions: 5/5 pass
* fix(terminal): three-layer defense against watch_patterns notification spam
Background processes that stack notify_on_complete=True with watch_patterns
can flood the user with duplicate, delayed notifications — matches deliver
asynchronously via the completion queue and continue arriving minutes after
the process has exited. The docstring warning against this (PR #12113) has
proven insufficient; agents still misuse the combination.
Three layered defenses, each sufficient on its own:
1. Mutual exclusion (terminal_tool.py): When both flags are set on a
background process, drop watch_patterns with a warning. notify_on_complete
wins because 'let me know when it's done' is the more useful signal and
fires exactly once. Extracted as _resolve_notification_flag_conflict() so
the rule is testable in isolation.
2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now
bails the moment session.exited is True. Post-exit chunks (buffered reads
draining after the process is gone) no longer produce notifications. This
is the fix flagged as future work in session 20260418_020302_79881c.
3. Global circuit breaker (process_registry.py): Per-session rate limits don't
catch the sibling-flood case — N concurrent processes can each stay under
8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap
trips a 30-second cooldown across ALL sessions, emits a single
watch_overflow_tripped event, silently counts dropped events, and emits a
watch_overflow_released summary when the cooldown ends.
Also updates the tool schema + docstring to document the new behavior.
Tests: 8 new tests covering all three fixes (suppress-after-exit x2,
mutual-exclusion resolver x4, global breaker trip/cooldown/release x2).
All 60 tests across test_watch_patterns.py, test_notify_on_complete.py,
test_terminal_tool.py pass.
Real-world trigger: self-inflicted in session 20260425_051924 — three
concurrent hermes-sweeper review subprocesses each set watch_patterns=
['failed validation', 'errored'] AND notify_on_complete=True, then iterated
over multiple items, producing enough matches per process to defeat the
per-session cap while staying under the global cap that didn't yet exist.
* fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion
Per Teknium's direction, the watch_patterns rate limit is now much more
aggressive and self-healing.
## New rule — per session
- HARD cap: 1 watch-match notification per 15 seconds per process.
- Any match arriving inside the cooldown window is dropped and counts as
ONE strike for that window (many drops in the same window still = 1 strike).
- After 3 consecutive strike windows, watch_patterns is permanently disabled
for the session and the session is auto-promoted to notify_on_complete
semantics — exactly one notification when the process actually exits.
- A cooldown window that expires with zero drops resets the consecutive
strike counter — healthy cadence is forgiven.
## Schema + docstring rewritten
The tool schema description now gives the model explicit guidance:
- notify_on_complete is 'the right choice for almost every long-running task'
- watch_patterns is for RARE one-shot signals on LONG-LIVED processes
- Do NOT use watch_patterns with loops/batch jobs — error patterns fire every
iteration and will hit the strike limit fast
- Mutual exclusion is stated on both parameter descriptions
- 1/15s cooldown and 3-strike promotion are stated in the watch_patterns
description so the model sees the contract every turn
## Removed
- WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the
new 1/15s limit subsumes both; keeping them would double-count.
- _watch_window_hits / _watch_window_start / _watch_overload_since fields
on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until
/ _watch_strike_candidate / _watch_consecutive_strikes.
## Kept
- Global circuit breaker across all sessions (15/10s → 30s cooldown) as a
secondary safety net for concurrent siblings. Still valuable when 20
short-lived processes each fire once — none individually violates the
per-session limit.
- Suppress-after-exit guard.
- Mutual exclusion resolver at the tool entry point.
## Tests
- 6 new tests in TestPerSessionRateLimit covering: first match delivers,
second in cooldown suppressed, multi-drop = single strike, 3 strikes
disables + promotes, clean window resets counter, suppressed count
carried to next emit.
- Global circuit breaker tests rewritten to use fresh sessions instead of
hacking removed per-window fields.
- 50/50 watch_patterns + notify_on_complete tests pass.
- 60/60 including test_terminal_tool.py pass.
`hermes tools` → "reconfigure existing" listed Spotify twice because
the Apr 24 refactor that moved Spotify into plugins/spotify/ (PR #15174)
left the entry in CONFIGURABLE_TOOLSETS. _get_effective_configurable_toolsets()
unconditionally appended get_plugin_toolsets() on top, so the same
'spotify' key showed up from both sources.
Dedupe by key — built-in CONFIGURABLE_TOOLSETS entry wins (it has the
nicer label and description). Also guards against future bundled plugins
that share a toolset key with a built-in.
Generalize the temperature-specific 400 retry that shipped in PR #15621 so
the same reactive strategy covers any provider that rejects an arbitrary
request parameter — — not just temperature.
- agent/auxiliary_client.py:
* New _is_unsupported_parameter_error(exc, param): matches the same six
phrasings the old temperature detector did plus 'unrecognized parameter'
and 'invalid parameter', against any named param.
* _is_unsupported_temperature_error is now a thin back-compat wrapper so
existing imports and tests keep working.
* The max_tokens → max_completion_tokens retry branch in call_llm and
async_call_llm now (a) gates on 'max_tokens is not None' so we do not
pop a key that was never set and silently substitute a None value on
the retry, and (b) also matches the generic helper in addition to the
legacy 'max_tokens' / 'unsupported_parameter' substring checks — picking
up phrasings like 'Unknown parameter: max_tokens' that previously slipped
through.
- tests/agent/test_unsupported_parameter_retry.py: 18 new tests covering
the generic detector across params, the back-compat wrapper, and the two
hardenings to the max_tokens retry branch (None gate + generic phrasing).
Credit: retry-generalization pattern from @nicholasrae's PR #15416. That PR
also proposed the reactive temperature retry which landed independently via
PR #15621 + #15623 (co-authored with @BlueBirdBack). This commit salvages
the remaining hardening ideas onto current main.
When the auxiliary compression model's context is smaller than the main
model's compression threshold, _check_compression_model_feasibility
auto-lowers the session threshold. Previously it set:
new_threshold = aux_context
This let the raw message list grow to exactly aux_context tokens. But
compression and flush_memories actually send system_prompt + tool_schemas
+ messages to the aux model. With 50+ tools that overhead is 25-30K
tokens, so the full request overflowed aux with HTTP 400.
Subtract a headroom estimate from aux_context before setting the new
threshold: the actual tool-schema token count (from
estimate_request_tokens_rough) plus a 12K allowance for the system
prompt (not yet built at __init__ time) and flush-instruction overhead.
Clamp to MINIMUM_CONTEXT_LENGTH so the session still starts even with
an unusually heavy tool schema.
This fixes the 'flush_memories overflow on busy toolsets' path that
Teknium flagged — where main and aux can be nominally the same model
but still 400 because the threshold left no room for the request
overhead. Same fix also protects the normal compression summarisation
request on the same binding aux.
Tests: two new regression tests cover the headroom reservation and the
MINIMUM_CONTEXT_LENGTH floor. Two existing tests updated for the new
(lower) threshold values now that empty-tools still produces a 12K
static headroom deduction.
Universal reactive fix for 'HTTP 400: Unsupported parameter: temperature'
across all providers/models — not just Codex Responses.
The same backend can accept temperature for some models and reject it for
others (e.g. gpt-5.4 accepts but gpt-5.5 rejects on the same OpenAI
endpoint; similar patterns on Copilot, OpenRouter reasoning routes, and
Anthropic Opus 4.7+ via OAI-compat). An allow/deny-list by model name does
not scale.
call_llm / async_call_llm now detect the concrete 'unsupported parameter:
temperature' 400 and transparently retry once without temperature. Kimi's
server-managed omission and Opus 4.7+'s proactive strip stay in place —
this is the safety net for everything else.
Changes:
- agent/auxiliary_client.py: add _is_unsupported_temperature_error helper;
wire into both sync and async call_llm paths before the existing
max_tokens/payment/auth retry ladder
- tests/agent/test_unsupported_temperature_retry.py: 19 tests covering
detector phrasings, sync + async retry, no-retry-without-temperature,
and non-temperature 400s not triggering the retry
Builds on PR #15620 (codex_responses fallback) which stripped temperature
up front for that one api_mode. This PR closes the gap for every other
provider/model combo via reactive retry.
Credit: retry approach and detector originate from @BlueBirdBack's PR #15578.
Co-authored-by: BlueBirdBack <BlueBirdBack@users.noreply.github.com>
The memory-flush fallback for api_mode='codex_responses' was unconditionally
adding `temperature` to codex_kwargs before calling _run_codex_stream. The
Responses API does not accept temperature on any supported backend:
- chatgpt.com/backend-api/codex rejects it outright
- api.openai.com + gpt-5/o-series reasoning models reject it
- Copilot Responses rejects it on reasoning models
The CodexAuxiliaryClient adapter and the codex_responses transport both
correctly omit temperature — the flush fallback was the only path putting
it back. On errors from the primary aux path (e.g. expired OAuth token),
users saw `⚠ Auxiliary memory flush failed: HTTP 400: Unsupported parameter:
temperature`.
Reported by Garik [NOUS] on GPT-5.5 via Codex OAuth Pro.
Both discord (read/participate) and discord_admin (server admin) are now
configurable via `hermes tools` with default-OFF. Previously the core
discord tool (fetch_messages, search_members, create_thread) auto-loaded
on every Discord install with DISCORD_BOT_TOKEN set — 19 tools the user
never opted into.
Adds a platform-scoping mechanism (_TOOLSET_PLATFORM_RESTRICTIONS) so
the discord toolsets only show up in the Discord platform's checklist,
not on CLI/Telegram/Slack/etc. Applied at four gates:
- _prompt_toolset_checklist: checklist filter
- _get_platform_tools: resolution filter (both branches)
- _save_platform_tools: save-time filter (covers 'Configure all
platforms' and hand-edited config.yaml)
- tools_disable_enable_command: rejects `hermes tools enable discord`
on non-Discord platforms with a clear error
build_session_context_prompt now injects the Discord IDs block only
when both conditions hold: the discord/discord_admin toolset is
enabled AND DISCORD_BOT_TOKEN is set. Toolset alone isn't enough —
the tool's check_fn gates on the token at registry time, so opting
in without a token yields no tools and the IDs block would lie.
Otherwise keep the stale-API disclaimer.
The feishu_doc and feishu_drive tools were registered in the tool
registry but never added to the hermes-feishu composite toolset.
The pipeline fix from the prior commit now recovers them automatically
once they are in the composite.
Split the monolithic discord_server tool (14 actions) into two:
- discord: core actions (fetch_messages, search_members, create_thread)
that are useful for the agent's normal operation. Auto-enabled on
the discord platform via the pipeline fix.
- discord_admin: server management actions (list channels/roles, pins,
role assignment) that require explicit opt-in via hermes tools.
Added to CONFIGURABLE_TOOLSETS and _DEFAULT_OFF_TOOLSETS.
The reverse-mapping loop in _get_platform_tools only checked
CONFIGURABLE_TOOLSETS, silently dropping platform-specific toolsets
like discord and feishu_doc whose tools were in the composite but
had no configurable key. Add a second pass over TOOLSETS that picks
up unclaimed toolsets whose tools are present in the resolved
composite.
The tool schema promised 'On update, pass an empty array to clear' but the
update branch ignored the context_from kwarg entirely — users could set
the field at create time and never modify or clear it afterward.
- tools/cronjob_tools.py: handle context_from in the update branch the
same way script/enabled_toolsets/workdir are handled: normalize str/list
to refs, validate each referenced job exists (same check the create
branch does), store as list-or-None to match create_job()'s shape.
Empty string or empty list clears the field.
- tests/cron/test_cron_context_from.py: 6 new tests covering add/change/
clear (both shapes)/bad-ref/preserve-across-unrelated-update.
YAML parses bare numeric toolset names (e.g. 12306:) as int, causing
TypeError in sorted() since the read path normalizes to str but the
save path did not.
The no_mcp sentinel was preserved in existing entries even when the
user re-enabled MCP servers, causing MCP to stay silently disabled.
update_model() recalculated threshold_tokens but left tail_token_budget
and max_summary_tokens at their __init__ values. When switching from a
200K model to 32K, the tail budget stayed at ~20K tokens (62% of 32K)
instead of the intended ~10%.
Adds budget recalculation in update_model() and 2 regression tests.
Subagents run inside a ThreadPoolExecutor. The CLI's interactive approval
callback lives in tools/terminal_tool.py's threading.local(), which worker
threads do not inherit. When a subagent hits a dangerous-command guard,
prompt_dangerous_approval() falls back to input() from the worker thread,
deadlocking against the parent's prompt_toolkit TUI that owns stdin.
Fix: install a non-interactive callback into every subagent worker thread
via ThreadPoolExecutor(initializer=set_approval_callback, initargs=(cb,)).
The callback is config-gated by delegation.subagent_auto_approve:
false (default) -> _subagent_auto_deny (safe; matches leaf tool blocklist)
true -> _subagent_auto_approve (opt-in YOLO for cron/batch)
Both emit a logger.warning audit line. Gateway sessions are unaffected
because they resolve approvals via tools/approval.py's per-session queue,
not through these TLS callbacks. Diagnosis credit: @MorAlekss (#14685).
- hermes_cli/config.py: DEFAULT_CONFIG.delegation.subagent_auto_approve: False
- cli-config.yaml.example: documented, commented (default)
- tools/delegate_tool.py: _subagent_auto_deny, _subagent_auto_approve,
_get_subagent_approval_callback, wired into the child timeout executor
- tests/tools/test_delegate.py: 7 tests covering defaults, truthy coercion,
and TLS scoping in the worker thread
Two adjustments to make CI pass:
- In gateway/platforms/matrix.py: `DeviceID` is `NewType("DeviceID", str)`,
so passing `client.device_id` directly (already a str) works identically
at runtime. The explicit import was cosmetic and tripped CI environments
where `mautrix.types` doesn't re-export DeviceID at the expected path
("cannot import name 'DeviceID' from 'mautrix.types' (unknown location)").
- In tests/gateway/test_matrix.py: add `put_device_id` to the hand-written
`PgCryptoStore` fake so the three encryption-path tests
(test_connect_with_access_token_and_encryption,
test_connect_uses_configured_device_id_over_whoami,
test_connect_registers_encrypted_event_handler_when_encryption_on) can
exercise the new crypto-store binding without AttributeError.
/model gpt-5.5 on openai-codex showed 'Context: 1,050,000 tokens' because
the display block used ModelInfo.context_window directly from models.dev.
Codex OAuth actually enforces 272K for the same slug, and the agent's
compressor already runs at 272K via get_model_context_length() — so the
banner + real context budget said 272K while /model lied with 1M.
Route the display context through a new resolve_display_context_length()
helper that always prefers agent.model_metadata.get_model_context_length
(which knows about Codex OAuth, Copilot, Nous caps) and only falls back
to models.dev when that returns nothing.
Fix applied to all 3 /model display sites:
cli.py _handle_model_switch
gateway/run.py picker on_model_selected callback
gateway/run.py text-fallback confirmation
Reported by @emilstridell (Telegram, April 2026).
The three google-workspace scripts (setup.py, google_api.py, gws_bridge.py)
each had their own way of resolving HERMES_HOME:
- setup.py imported hermes_constants (crashes outside Hermes process)
- google_api.py used os.getenv inline (no strip, no empty handling)
- gws_bridge.py defined its own local get_hermes_home() (duplicate)
Extract the common logic into _hermes_home.py which:
- Delegates to hermes_constants when available (profile support, etc.)
- Falls back to os.getenv with .strip() + empty-as-unset handling
- Provides display_hermes_home() with ~/ shortening for profiles
All three scripts now import from _hermes_home instead of duplicating.
7 regression tests cover the fallback path: env var override, default
~/.hermes, empty env var, display shortening, profile paths, and
custom non-home paths.
Closes#12722
Extracts _needs_kimi_tool_reasoning() for symmetry with the existing
_needs_deepseek_tool_reasoning() helper, so _copy_reasoning_content_for_api
uses the same detection logic as _build_assistant_message. Future changes
to either provider's signals now only touch one function.
Adds tests/run_agent/test_deepseek_reasoning_content_echo.py covering:
- All 3 DeepSeek detection signals (provider, model, host)
- Poisoned history replay (empty string fallback)
- Plain assistant turns NOT padded
- Explicit reasoning_content preserved
- Reasoning field promoted to reasoning_content
- Existing Kimi/Moonshot detection intact
- Non-thinking providers left alone
21 tests, all pass.
``run_conversation`` was calling ``memory_manager.sync_all(
original_user_message, final_response)`` at the end of every turn
where both args were present. That gate didn't consider the
``interrupted`` local flag, so an external memory backend received
partial assistant output, aborted tool chains, or mid-stream resets as
durable conversational truth. Downstream recall then treated the
not-yet-real state as if the user had seen it complete, poisoning the
trust boundary between "what the user took away from the turn" and
"what Hermes was in the middle of producing when the interrupt hit".
Extracted the inline sync block into a new private method
``AIAgent._sync_external_memory_for_turn(original_user_message,
final_response, interrupted)`` so the interrupt guard is a single
visible check at the top of the method instead of hidden in a
boolean-and at the call site. That also gives tests a clean seam to
assert on — the pre-fix layout buried the logic inside the 3,000-line
``run_conversation`` function where no focused test could reach it.
The new method encodes three independent skip conditions:
1. ``interrupted`` → skip entirely (the #15218 fix). Applies even
when ``final_response`` and ``original_user_message`` happen to
be populated — an interrupt may have landed between a streamed
reply and the next tool call, so the strings on disk are not
actually the turn the user took away.
2. No memory manager / no final_response / no user message →
preserve existing skip behaviour (nothing new for providerless
sessions, system-initiated refreshes, tool-only turns that never
resolved, etc.).
3. Sync_all / queue_prefetch_all exceptions → swallow. External
memory providers are strictly best-effort; a misconfigured or
offline backend must never block the user from seeing their
response.
The prefetch side-effect is gated on the same interrupt flag: the
user's next message is almost certainly a retry of the same intent,
and a prefetch keyed on the interrupted turn would fire against stale
context.
### Tests (16 new, all passing on py3.11 venv)
``tests/run_agent/test_memory_sync_interrupted.py`` exercises the
helper directly on a bare ``AIAgent`` (``__new__`` pattern that the
interrupt-propagation tests already use). Coverage:
- Interrupted turn with full-looking response → no sync (the fix)
- Interrupted turn with long assistant output → no sync (the interrupt
could have landed mid-stream; strings-on-disk lie)
- Normal completed turn → sync_all + queue_prefetch_all both called
with the right args (regression guard for the positive path)
- No final_response / no user_message / no memory manager → existing
pre-fix skip paths still apply
- sync_all raises → exception swallowed, prefetch still attempted
- queue_prefetch_all raises → exception swallowed after sync succeeded
- 8-case parametrised matrix across (interrupted × final_response ×
original_user_message) asserts sync fires iff interrupted=False AND
both strings are non-empty
Closes#15218
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends PR #15171 to also cover the server-side cancellation path (aiohttp
shutdown, request-level timeout) — previously only ConnectionResetError
triggered the incomplete-snapshot write, so cancellations left the store
stuck at the in_progress snapshot written on response.created.
Factors the incomplete-snapshot build into a _persist_incomplete_if_needed()
helper called from both the ConnectionResetError and CancelledError
branches; the CancelledError handler re-raises so cooperative cancellation
semantics are preserved.
Adds two regression tests that drive _write_sse_responses directly (the
TestClient disconnect path races the server handler, which makes the
end-to-end assertion flaky).
When display.busy_input_mode is 'queue', the runner-level PRIORITY block
in _handle_message was still calling running_agent.interrupt() for every
text follow-up to an active session. The adapter-level busy handler
already honors queue mode (commit 9d147f7fd), but this runner-level path
was an unconditional interrupt regardless of config.
Adds a queue-mode branch that queues the follow-up via
_queue_or_replace_pending_event() and returns without interrupting.
Salvages the useful part of #12070 (@knockyai). The config fan-out to
per-platform extra was redundant — runner already loads busy_input_mode
directly via _load_busy_input_mode().
skill_view response went to the model verbatim; duplicating the SKILL.md
body as raw_content on every tool call added token cost with no agent-facing
benefit. Remove the field and update tests to assert on content only.
The slash/preload caller (agent/skill_commands.py) already falls back to
content when raw_content is absent, and it calls skill_view(preprocess=False)
anyway, so content is already unrendered on that path.
Extends _repair_tool_call_arguments() to cover the most common local-model
JSON corruption pattern: llama.cpp/Ollama backends emit literal tabs and
newlines inside JSON string values (memory save summaries, file contents,
etc.). Previously fell through to '{}' replacement, losing the call.
Adds two repair passes:
- Pass 0: json.loads(strict=False) + re-serialise to canonical wire form
- Pass 4: escape 0x00-0x1F control chars inside string values, then retry
Ports the core utility from #12068 / PR #12093 without the larger plumbing
change (that PR also replaced json.loads at 8 call sites; current main's
_repair_tool_call_arguments is already the single chokepoint, so the
upgrade happens transparently for every existing caller).
Credit: @truenorth-lj for the original utility design.
4 new regression tests covering literal newlines, tabs, re-serialisation
to strict=True-valid output, and the trailing-comma + control-char
combination case.
When the streaming path (chat completions) assembled tool call deltas and
detected malformed JSON arguments, it set has_truncated_tool_args=True but
passed the broken args through unchanged. This triggered the truncation
handler which returned a partial result and killed the session (/new required).
_many_ malformations are repairable: trailing commas, unclosed brackets,
Python None, empty strings. _repair_tool_call_arguments() already existed
for the pre-API-request path but wasn't called during streaming assembly.
Now when JSON parsing fails during streaming assembly, we attempt repair
via _repair_tool_call_arguments() before flagging as truncated. If repair
succeeds (returns valid JSON), the tool call proceeds normally. Only truly
unrepairable args fall through to the truncation handler.
This prevents the most common session-killing failure mode for models like
GLM-5.1 that produce trailing commas or unclosed brackets.
Tests: 12 new streaming assembly repair tests, all 29 existing repair
tests still passing.
When a session is split by context compression mid-tool-call, an assistant
message may end up with truncated/invalid JSON in tool_calls[*].function.arguments.
On the next turn this is replayed verbatim and providers reject the entire request
with HTTP 400 invalid_tool_call_format, bricking the conversation in a loop that
cannot recover without manual session quarantine.
This patch adds a defensive sanitizer that runs immediately before
client.chat.completions.create() in AIAgent.run_conversation():
- Validates each assistant tool_calls[*].function.arguments via json.loads
- Replaces invalid/empty arguments with '{}'
- Injects a synthetic tool response (or prepends a marker to the existing one)
so downstream messages keep valid tool_call_id pairing
- Logs each repair with session_id / message_index / preview for observability
Defense in depth: corruption can originate from compression splits, manual edits,
or plugin bugs. Sanitizing at the send chokepoint catches all sources.
Adds 7 unit tests covering: truncated JSON, empty string, None, non-string args,
existing matching tool response (no duplicate injection), non-assistant messages
ignored, multiple repairs.
Fixes#15236
gpt-5.x on the Codex Responses API sometimes degenerates and emits
Harmony-style `to=functions.<name> {json}` serialization as plain
assistant-message text instead of a structured `function_call` item.
The intent never makes it into `response.output` as a function_call,
so `tool_calls` is empty and `_normalize_codex_response()` returns
the leaked text as the final content. Downstream (e.g. delegate_task),
this surfaces as a confident-looking summary with `tool_trace: []`
because no tools actually ran — the Taiwan-embassy-email bug report.
Detect the pattern, scrub the content, and return finish_reason=
'incomplete' so the existing Codex-incomplete continuation path
(run_agent.py:11331, 3 retries) gets a chance to re-elicit a proper
function_call item. Encrypted reasoning items are preserved so the
model keeps its chain-of-thought on the retry.
Regression tests: leaked text triggers incomplete, real tool calls
alongside leak-looking text are preserved, clean responses pass
through unchanged.
Reported on Discord (gpt-5.4 / openai-codex).
Covers the two bugs salvaged from PR #15161:
- test_batch_runner_checkpoint: TestFinalCheckpointNoDuplicates asserts
the final aggregated completed_prompts list has no duplicate indices,
and keeps a sanity anchor test documenting the pre-fix pattern so a
future refactor that re-introduces it is caught immediately.
- test_model_tools: TestCoerceNumberInfNan asserts _coerce_number
returns the original string for inf/-inf/nan/Infinity inputs and that
the result round-trips through strict (allow_nan=False) json.dumps.
TUI auto-resolves `display.personality` at session init, unlike the base CLI.
If config contains `agent.personalities: null`, `_resolve_personality_prompt`
called `.get()` on None and failed before model/provider selection.
Normalize null personalities to `{}` and surface a targeted config warning.
Tolerating null top-level keys silently drops user settings (e.g.
`agent.system_prompt` next to a bare `agent:` line is gone). Probe at
session create, log via `logger.warning`, and surface in the boot info
under `config_warning` — rendered in the TUI feed alongside the existing
`credential_warning` banner.
YAML parses bare keys like `agent:` or `display:` as None. `dict.get(key, {})`
returns that None instead of the default (defaults only fire on missing keys),
so every `cfg.get("agent", {}).get(...)` chain in tui_gateway/server.py
crashed agent init with `'NoneType' object has no attribute 'get'`.
Guard all 21 sites with `(cfg.get(X) or {})`. Regression test covers the
null-section init path reported on Twitter against the new TUI.
Follow-up to the canonical-identity session-key fix: pull the
JID/LID normalize/expand/canonical helpers into gateway/whatsapp_identity.py
instead of living in two places. gateway/session.py (session-key build) and
gateway/run.py (authorisation allowlist) now both import from the shared
module, so the two resolution paths can't drift apart.
Also switches the auth path from module-level _hermes_home (cached at
import time) to dynamic get_hermes_home() lookup, which matches the
session-key path and correctly reflects HERMES_HOME env overrides. The
lone test that monkeypatched gateway.run._hermes_home for the WhatsApp
auth path is updated to set HERMES_HOME env var instead; all other
tests that monkeypatch _hermes_home for unrelated paths (update,
restart drain, shutdown marker, etc.) still work — the module-level
_hermes_home is untouched.
Hermes' WhatsApp bridge routinely surfaces the same person under either
a phone-format JID (60123456789@s.whatsapp.net) or a LID (…@lid),
and may flip between the two for a single human within the same
conversation. Before this change, build_session_key used the raw
identifier verbatim, so the bridge reshuffling an alias form produced
two distinct session keys for the same person — in two places:
1. DM chat_id — a user's DM sessions split in half, transcripts and
per-sender state diverge.
2. Group participant_id (with group_sessions_per_user enabled) — a
member's per-user session inside a group splits in half for the
same reason.
Add a canonicalizer that walks the bridge's lid-mapping-*.json files
and picks the shortest/numeric-preferred alias as the stable identity.
build_session_key now routes both the DM chat_id and the group
participant_id through this helper when the platform is WhatsApp.
All other platforms and chat types are untouched.
Expose canonical_whatsapp_identifier and normalize_whatsapp_identifier
as public helpers. Plugins that need per-sender behaviour (role-based
routing, per-contact authorization, policy gating) need the same
identity resolution Hermes uses internally; without a public helper,
each plugin would have to re-implement the walker against the bridge's
internal on-disk format. Keeping this alongside build_session_key
makes it authoritative and one refactor away if the bridge ever
changes shape.
_expand_whatsapp_aliases stays private — it's an implementation detail
of how the mapping files are walked, not a contract callers should
depend on.
Exposes hermes --tui over a PTY-backed WebSocket so the dashboard can
embed the real TUI rather than reimplement its surface. The browser
attaches xterm.js to the socket; keystrokes flow in, PTY output bytes
flow out.
Architecture:
browser <Terminal> (xterm.js)
│ onData ───► ws.send(keystrokes)
│ onResize ► ws.send('\x1b[RESIZE:cols;rows]')
│ write ◄── ws.onmessage (PTY bytes)
▼
FastAPI /api/pty (token-gated, loopback-only)
▼
PtyBridge (ptyprocess) ── spawns node ui-tui/dist/entry.js ──► tui_gateway + AIAgent
Components
----------
hermes_cli/pty_bridge.py
Thin wrapper around ptyprocess.PtyProcess: byte-safe read/write on the
master fd via os.read/os.write (not PtyProcessUnicode — ANSI is
inherently byte-oriented and UTF-8 boundaries may land mid-read),
non-blocking select-based reads, TIOCSWINSZ resize, idempotent
SIGHUP→SIGTERM→SIGKILL teardown, platform guard (POSIX-only; Windows
is WSL-supported only).
hermes_cli/web_server.py
@app.websocket("/api/pty") endpoint gated by the existing
_SESSION_TOKEN (via ?token= query param since browsers can't set
Authorization on WS upgrades). Loopback-only enforcement. Reader task
uses run_in_executor to pump PTY bytes without blocking the event
loop. Writer loop intercepts a custom \x1b[RESIZE:cols;rows] escape
before forwarding to the PTY. The endpoint resolves the TUI argv
through a _resolve_chat_argv hook so tests can inject fake commands
without building the real TUI.
Tests
-----
tests/hermes_cli/test_pty_bridge.py — 12 unit tests: spawn, stdout,
stdin round-trip, EOF, resize (via TIOCSWINSZ + tput readback), close
idempotency, cwd, env forwarding, unavailable-platform error.
tests/hermes_cli/test_web_server.py — TestPtyWebSocket adds 7 tests:
missing/bad token rejection (close code 4401), stdout streaming,
stdin round-trip, resize escape forwarding, unavailable-platform ANSI
error frame + 1011 close, resume parameter forwarding to argv.
96 tests pass under scripts/run_tests.sh.
(cherry picked from commit 29b337bca70fc9efb082a5a852ea2cd5381af1a9)
feat(web): add Chat tab with xterm.js terminal + Sessions resume button
(cherry picked from commit 3d21aee8 by emozilla, conflicts resolved
against current main: BUILTIN_ROUTES table + plugin slot layout)
fix(tui): replace OSC 52 jargon in /copy confirmation
When the user ran /copy successfully, Ink confirmed with:
sent OSC52 copy sequence (terminal support required)
That reads like a protocol spec to everyone who isn't a terminal
implementer. The caveat was a historical artifact — OSC 52 wasn't
universally supported when this message was written, so the TUI
honestly couldn't guarantee the copy had landed anywhere.
Today every modern terminal (including the dashboard's embedded
xterm.js) handles OSC 52 reliably. Say what the user actually wants
to know — that it copied, and how much — matching the message the
TUI already uses for selection copy:
copied 1482 chars
(cherry picked from commit a0701b1d5a598dd1d3b94038a7bcbb2a3ab559fc)
docs: document the dashboard Chat tab
AGENTS.md — new subsection under TUI Architecture explaining that the
dashboard embeds the real hermes --tui rather than rewriting it,
with pointers to the pty_bridge + WebSocket endpoint and the rule
'never add a parallel chat surface in React.'
website/docs/user-guide/features/web-dashboard.md — user-facing Chat
section inside the existing Web Dashboard page, covering how it works
(WebSocket + PTY + xterm.js), the Sessions-page resume flow, and
prerequisites (Node.js, ptyprocess, POSIX kernel / WSL on Windows).
(cherry picked from commit 2c2e32cc4519973c77b63016316b065c0f656704)
feat(tui-gateway): transport-aware dispatch + WebSocket sidecar
Decouples the JSON-RPC dispatcher from its I/O sink so the same handler
surface can drive multiple transports concurrently. The PTY chat tab
already speaks to the TUI binary as bytes — this adds a structured
event channel alongside it for dashboard-side React widgets that need
typed events (tool.start/complete, model picker state, slash catalog)
that PTY can't surface.
- `tui_gateway/transport.py` — `Transport` protocol + `contextvars` binding
+ module-level `StdioTransport` fallback. The stdio stream resolves
through a lambda so existing tests that monkey-patch `_real_stdout`
keep passing without modification.
- `tui_gateway/ws.py` — WebSocket transport implementation; FastAPI
endpoint mounting lives in hermes_cli/web_server.py.
- `tui_gateway/server.py`:
- `write_json` routes via session transport (for async events) →
contextvar transport (for in-request writes) → stdio fallback.
- `dispatch(req, transport=None)` binds the transport for the request
lifetime and propagates it to pool workers via `contextvars.copy_context`
so async handlers don't lose their sink.
- `_init_session` and the manual-session create path stash the
request's transport so out-of-band events (subagent.complete, etc.)
fan out to the right peer.
`tui_gateway.entry` (Ink's stdio handshake) is unchanged externally —
it falls through every precedence step into the stdio fallback, byte-
identical to the previous behaviour.
feat(web): ChatSidebar — JSON-RPC sidecar next to xterm.js terminal
Composes the two transports into a single Chat tab:
┌─────────────────────────────────────────┬──────────────┐
│ xterm.js / PTY (emozilla #13379) │ ChatSidebar │
│ the literal hermes --tui process │ /api/ws │
└─────────────────────────────────────────┴──────────────┘
terminal bytes structured events
The terminal pane stays the canonical chat surface — full TUI fidelity,
slash commands, model picker, mouse, skin engine, wide chars all paint
inside the terminal. The sidebar opens a parallel JSON-RPC WebSocket
to the same gateway and renders metadata that PTY can't surface to
React chrome:
• model + provider badge with connection state (click → switch)
• running tool-call list (driven by tool.start / tool.progress /
tool.complete events)
• model picker dialog (gateway-driven, reuses ModelPickerDialog)
The sidecar is best-effort. If the WS can't connect (older gateway,
network hiccup, missing token) the terminal pane keeps working
unimpaired — sidebar just shows the connection-state badge in the
appropriate tone.
- `web/src/components/ChatSidebar.tsx` — new component (~270 lines).
Owns its GatewayClient, drives the model picker through
`slash.exec`, fans tool events into a capped tool list.
- `web/src/pages/ChatPage.tsx` — split layout: terminal pane
(`flex-1`) + sidebar (`w-80`, `lg+` only).
- `hermes_cli/web_server.py` — mount `/api/ws` (token + loopback
guards mirror /api/pty), delegate to `tui_gateway.ws.handle_ws`.
Co-authored-by: emozilla <emozilla@nousresearch.com>
refactor(web): /clean pass on ChatSidebar + ChatPage lint debt
- ChatSidebar: lift gw out of useRef into a useMemo derived from a
reconnect counter. React 19's react-hooks/refs and react-hooks/
set-state-in-effect rules both fire when you touch a ref during
render or call setState from inside a useEffect body. The
counter-derived gw is the canonical pattern for "external resource
that needs to be replaceable on user action" — re-creating the
client comes from bumping `version`, the effect just wires + tears
down. Drops the imperative `gwRef.current = …` reassign in
reconnect, drops the truthy ref guard in JSX. modelLabel +
banner inlined as derived locals (one-off useMemo was overkill).
- ChatPage: lazy-init the banner state from the missing-token check
so the effect body doesn't have to setState on first run. Drops
the unused react-hooks/exhaustive-deps eslint-disable. Adds a
scoped no-control-regex disable on the SGR mouse parser regex
(the \\x1b is intentional for xterm escape sequences).
All my-touched files now lint clean. Remaining warnings on web/
belong to pre-existing files this PR doesn't touch.
Verified: vitest 249/249, ui-tui eslint clean, web tsc clean,
python imports clean.
chore: uptick
fix(web): drop ChatSidebar tool list — events can't cross PTY/WS boundary
The /api/pty endpoint spawns `hermes --tui` as a child process with its
own tui_gateway and _sessions dict; /api/ws runs handle_ws in-process in
the dashboard server with a separate _sessions dict. Tool events fire on
the child's gateway and never reach the WS sidecar, so the sidebar's
tool.start/progress/complete listeners always observed an empty list.
Drop the misleading list (and the now-orphaned ToolCall primitive),
keep model badge + connection state + model picker + error banner —
those work because they're sidecar-local concerns. Surfacing tool calls
in the sidebar requires cross-process forwarding (PTY child opens a
back-WS to the dashboard, gateway tees emits onto stdio + sidecar
transport) — proper feature for a follow-up.
feat(web): wire ChatSidebar tool list to PTY child via /api/pub broadcast
The dashboard's /api/pty spawns hermes --tui as a child process; tool
events fire in the python tui_gateway grandchild and never crossed the
process boundary into the in-process WS sidecar — so the sidebar tool
list was always empty.
Cross-process forwarding:
- tui_gateway: TeeTransport (transport.py) + WsPublisherTransport
(event_publisher.py, sync websockets client). entry.py installs the
tee on _stdio_transport when HERMES_TUI_SIDECAR_URL is set, mirroring
every dispatcher emit to a back-WS without disturbing Ink's stdio
handshake.
- hermes_cli/web_server.py: new /api/pub (publisher) + /api/events
(subscriber) endpoints with a per-channel registry. /api/pty now
accepts ?channel= and propagates the sidecar URL via env. start_server
also stashes app.state.bound_port so the URL is constructable.
- web/src/pages/ChatPage.tsx: generates a channel UUID per mount,
passes it to /api/pty and as a prop to ChatSidebar.
- web/src/components/ChatSidebar.tsx: opens /api/events?channel=, fans
tool.start/progress/complete back into the ToolCall list. Restores
the ToolCall primitive.
Tests: 4 new TestPtyWebSocket cases cover channel propagation,
broadcast fan-out, and missing-channel rejection (10 PTY tests pass,
120 web_server tests overall).
fix(web): address Copilot review on #14890
Five threads, all real:
- gatewayClient.ts: register `message`/`close` listeners BEFORE awaiting
the open handshake. Server emits `gateway.ready` immediately after
accept, so a listener attached after the open promise could race past
the initial skin payload and lose it.
- ChatSidebar.tsx: wire `error`/`close` on the /api/events subscriber
WS into the existing error banner. 4401/4403 (auth/loopback reject)
surface as a "reload the page" message; mid-stream drops surface as
"events feed disconnected" with the existing reconnect button. Clean
unmount closes (1000/1001) stay silent.
- web-dashboard.md: install hint was `pip install hermes-agent[web]` but
ptyprocess lives in the `pty` extra, not `web`. Switch to
`hermes-agent[web,pty]` in both prerequisite blocks.
- AGENTS.md: previous "never add a parallel React chat surface" guidance
was overbroad and contradicted this PR's sidebar. Tightened to forbid
re-implementing the transcript/composer/PTY terminal while explicitly
allowing structured supporting widgets (sidebar / model picker /
inspectors), matching the actual architecture.
- web/package-lock.json: regenerated cleanly so the wterm sibling
workspace paths (extraneous machine-local entries) stop polluting CI.
Tests: 249/249 vitest, 10/10 PTY/events, web tsc clean.
refactor(web): /clean pass on ChatSidebar events handler
Spotted in the round-2 review:
- Banner flashed on clean unmount: `ws.close()` from the effect cleanup
fires `close` with code 1005, opened=true, neither 1000 nor 1001 —
hit the "unexpected drop" branch. Track `unmounting` in the effect
scope and gate the banner through a `surface()` helper so cleanup
closes stay silent.
- DRY the duplicated "events feed disconnected" string into a local
const used by both the error and close handlers.
- Drop the `opened` flag (no longer needed once the unmount guard is
the source of truth for "is this an expected close?").
## Problem
When a pooled HTTPS connection to the Bedrock runtime goes stale (NAT
timeout, VPN flap, server-side TCP RST, proxy idle cull), the next
Converse call surfaces as one of:
* botocore.exceptions.ConnectionClosedError / ReadTimeoutError /
EndpointConnectionError / ConnectTimeoutError
* urllib3.exceptions.ProtocolError
* A bare AssertionError raised from inside urllib3 or botocore
(internal connection-pool invariant check)
The agent loop retries the request 3x, but the cached boto3 client in
_bedrock_runtime_client_cache is reused across retries — so every
attempt hits the same dead connection pool and fails identically.
Only a process restart clears the cache and lets the user keep working.
The bare-AssertionError variant is particularly user-hostile because
str(AssertionError()) is an empty string, so the retry banner shows:
⚠️ API call failed: AssertionError
📝 Error:
with no hint of what went wrong.
## Fix
Add two helpers to agent/bedrock_adapter.py:
* is_stale_connection_error(exc) — classifies exceptions that
indicate dead-client/dead-socket state. Matches botocore
ConnectionError + HTTPClientError subtrees, urllib3
ProtocolError / NewConnectionError, and AssertionError
raised from a frame whose module name starts with urllib3.,
botocore., or boto3.. Application-level AssertionErrors are
intentionally excluded.
* invalidate_runtime_client(region) — per-region counterpart to
the existing reset_client_cache(). Evicts a single cached
client so the next call rebuilds it (and its connection pool).
Wire both into the Converse call sites:
* call_converse() / call_converse_stream() in
bedrock_adapter.py (defense-in-depth for any future caller)
* The two direct client.converse(**kwargs) /
client.converse_stream(**kwargs) call sites in run_agent.py
(the paths the agent loop actually uses)
On a stale-connection exception, the client is evicted and the
exception re-raised unchanged. The agent's existing retry loop then
builds a fresh client on the next attempt and recovers without
requiring a process restart.
## Tests
tests/agent/test_bedrock_adapter.py gets three new classes (14 tests):
* TestInvalidateRuntimeClient — per-region eviction correctness;
non-cached region returns False.
* TestIsStaleConnectionError — classifies botocore
ConnectionClosedError / EndpointConnectionError /
ReadTimeoutError, urllib3 ProtocolError, library-internal
AssertionError (both urllib3.* and botocore.* frames), and
correctly ignores application-level AssertionError and
unrelated exceptions (ValueError, KeyError).
* TestCallConverseInvalidatesOnStaleError — end-to-end: stale
error evicts the cached client, non-stale error (validation)
leaves it alone, successful call leaves it cached.
All 116 tests in test_bedrock_adapter.py pass.
Signed-off-by: Andre Kurait <andrekurait@gmail.com>