hermes-agent/tests/cli
Teknium f0dc919f92
fix(compression): include system prompt + tool schemas in token estimates (#18265)
The user-visible /compress banner and the post-compression last_prompt_tokens
writeback both counted only the raw message transcript (chars/4). With a 15KB
system prompt and 30 tool schemas (~26KB), a 4-message transcript that looks
like ~45 tokens to the transcript-only estimator is really ~10.5K tokens of
request pressure — a 234x gap.

Two user-facing consequences:
- Banner shows 'Compressing … (~45 tokens)…' while compression is actually
  firing on 10K+ tokens of real pressure, confusing users about why
  compression triggered (reported by @codecovenant on X; #6217).
- Post-compression last_prompt_tokens writeback omits tool schemas, so the
  next should_compress() check compares real usage against a stale
  underestimate — compression triggers late, potentially past the model's
  context limit on small-context models (#14695).

Swap estimate_messages_tokens_rough() for estimate_request_tokens_rough()
at every user-visible banner and at the post-compression writeback.
estimate_request_tokens_rough() already existed for exactly this purpose
and includes system prompt + tool schemas.

Touched call sites:
- run_agent.py: post-compression last_prompt_tokens writeback, post-tool
  call should_compress() fallback when provider usage is missing
- cli.py: /compress banner + summary
- gateway/run.py: gateway /compress banner + summary
- tui_gateway/server.py: TUI /compress status + summary
- acp_adapter/server.py: ACP /compact before/after

Left intentionally alone:
- Session-hygiene fallback and the 'no agent' /status path in gateway/run.py
  — no agent instance is in scope to query for system prompt/tools, and the
  existing 30-50% overestimate wobble on hygiene is safety-accepted.
- Verbose-mode 'Request size' logging — informational only, already counts
  system prompt via api_messages[0].

Also relabels the feedback line from 'Rough transcript estimate' to
'Approx request size' so the metric label matches what it actually measures.

Credits: diagnoses from @devilardis (#14695) and @Jackten (#6217);
user report @codecovenant on X (2026-04-30).

Closes #14695
Closes #6217
2026-04-30 23:03:54 -07:00
..
__init__.py
test_branch_command.py feat(memory): notify providers on mid-process session_id rotation (#17409) 2026-04-29 04:57:22 -07:00
test_busy_input_mode_command.py feat(busy): add 'steer' as a third display.busy_input_mode option (#16279) 2026-04-26 18:21:29 -07:00
test_cli_approval_ui.py fix(cli): wire approvals in background tasks 2026-04-26 12:29:48 -07:00
test_cli_background_tui_refresh.py
test_cli_bracketed_paste_sanitizer.py fix(cli): strip leaked bracketed-paste wrappers 2026-04-26 21:47:40 -07:00
test_cli_browser_connect.py fix(browser): address Copilot review on /browser connect 2026-04-28 22:11:10 -07:00
test_cli_context_warning.py
test_cli_copy_command.py
test_cli_extension_hooks.py
test_cli_external_editor.py
test_cli_file_drop.py
test_cli_force_redraw.py fix(cli): eliminate ghost status-bar + DSR input leaks from terminal drift 2026-04-27 05:31:47 -07:00
test_cli_image_command.py
test_cli_init.py fix(context): honor model.context_length for Ollama num_ctx and all display paths 2026-04-30 04:31:23 -07:00
test_cli_interrupt_subagent.py
test_cli_loading_indicator.py fix: clean up defensive shims and finish CI stabilization from #17660 (#17801) 2026-04-29 23:53:17 -07:00
test_cli_markdown_rendering.py
test_cli_mcp_config_watch.py
test_cli_new_session.py
test_cli_prefix_matching.py
test_cli_preloaded_skills.py
test_cli_provider_resolution.py
test_cli_reload_skills.py refactor(reload-skills): queue note for next turn, drop cache invalidation + agent tool 2026-04-29 21:07:47 -07:00
test_cli_retry.py
test_cli_save_config_value.py
test_cli_secret_capture.py
test_cli_shutdown_memory_messages.py fix(cli): pass session messages to shutdown_memory_provider (#15165 sibling) 2026-04-27 06:41:16 -07:00
test_cli_skin_integration.py fix(tui): restore macOS copy behavior and theme polish (#17131) 2026-04-28 18:47:14 -05:00
test_cli_status_bar.py
test_cli_status_command.py
test_cli_steer_busy_path.py
test_cli_terminal_response_sanitizer.py fix(cli): tighten mouse leak sanitizer 2026-04-29 22:10:18 -05:00
test_cli_tools_command.py
test_cli_user_message_preview.py
test_compress_focus.py
test_cprint_bg_thread.py fix(cli): surface self-improvement review summaries from bg thread 2026-04-30 14:07:22 -07:00
test_cwd_env_respect.py
test_fast_command.py feat(fast): broaden /fast whitelist to all OpenAI + Anthropic models (#16883) 2026-04-28 00:44:43 -07:00
test_gquota_command.py
test_manual_compress.py fix(compression): include system prompt + tool schemas in token estimates (#18265) 2026-04-30 23:03:54 -07:00
test_personality_none.py
test_quick_commands.py
test_reasoning_command.py
test_resume_display.py
test_save_conversation_location.py fix(sessions): /save lands under $HERMES_HOME, widen browse+TUI picker, force-refresh ollama-cloud on setup (#16296) 2026-04-26 18:49:48 -07:00
test_session_boundary_hooks.py
test_stream_delta_think_tag.py
test_surrogate_sanitization.py
test_tool_progress_scrollback.py
test_worktree_security.py
test_worktree.py