hermes-agent

History

Teknium ec671c4154 feat(image-input): native multimodal routing based on model vision capability (#16506 ) * feat(image-input): native multimodal routing based on model vision capability Attach user-sent images as OpenAI-style content parts on the user turn when the active model supports native vision, so vision-capable models see real pixels instead of a lossy text description from vision_analyze. Routing decision (agent/image_routing.py::decide_image_input_mode): agent.image_input_mode = auto \| native \| text (default: auto) In auto mode: - If auxiliary.vision.provider/model is explicitly configured, keep the text pipeline (user paid for a dedicated vision backend). - Else if models.dev reports supports_vision=True for the active provider/model, attach natively. - Else fall back to text (current behaviour). Call sites updated: gateway/run.py (all messaging platforms), tui_gateway (dashboard/Ink), cli.py (interactive /attach + drag-drop). run_agent.py changes: - _prepare_anthropic_messages_for_api now passes image parts through unchanged when the model supports vision — the Anthropic adapter translates them to native image blocks. Previous behaviour (vision_analyze → text) only runs for non-vision Anthropic models. - New _prepare_messages_for_non_vision_model mirrors the same contract for chat.completions and codex_responses paths, so non-vision models on any provider get text-fallback instead of failing at the provider. - New _model_supports_vision() helper reads models.dev caps. vision_analyze description rewritten: positions it as a tool for images NOT already visible in the conversation (URLs, tool output, deeper inspection). Prevents the model from redundantly calling it on images already attached natively. Config default: agent.image_input_mode = auto. Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py), all existing tests that reference _prepare_anthropic_messages_for_api still pass (198 targeted + new tests green). * feat(image-input): size-cap + resize oversized images, charge image tokens in compressor Two follow-ups that make the native image routing safer for long / heavy sessions: 1) Oversize handling in build_native_content_parts: - 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES, the most restrictive provider — Gemini inline data). - Delegates to vision_tools._resize_image_for_vision (Pillow-based, already battle-tested) to downscale to 5 MB first-try. - If Pillow is missing or resize still overshoots, the image is dropped and reported back in skipped[]; caller falls back to text enrichment for that image. 2) Image-token accounting in context_compressor: - New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant; within the realistic range for Anthropic/GPT-4o/Gemini billing). - _content_length_for_budget() helper: sums text-part lengths and charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/ input_image part. Base64 payload inside image_url is NOT counted as chars — dimensions don't matter, only image-presence. - Both tail-cut sites (_prune_old_tool_results L527 and _find_tail_cut_by_tokens L1126) now call the helper so multi-image conversations don't slip past compression budget. Tests: 9 new in test_image_routing.py (oversize triggers resize, resize-fails-returns-None, oversize-skipped-reported), 11 new in test_compressor_image_tokens.py (flat charge per image, multiple images, Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on raw base64, bounds-check on the constant, integration test that an image-heavy tail actually gets trimmed). * fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits The previous commit imposed a hardcoded 20 MB base64 ceiling on all providers, triggering auto-resize on anything larger. This was wrong in both directions: * Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400 'image exceeds 5 MB maximum' above that). * Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without complaint (empirically verified April 2026 with progressive PNG sizes). New behaviour: * _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder). * Providers NOT in the table get no ceiling — images attach at native size and we trust the provider to return its own error if it disagrees. A provider-specific 400 message is clearer than us guessing wrong and silently degrading image quality. * build_native_content_parts() gains a keyword-only provider arg; gateway/CLI/TUI pass the active provider so Anthropic users get auto-resize protection while OpenAI users don't pay it. * Resize target dropped from 5 MB to 4 MB to slide safely under Anthropic's boundary with header overhead. Empirical measurements (direct API, no Hermes in the loop): image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5 0.19 MB ✓ ✓ ✓ 12.37 MB ✗ 400 5MB ✓ ✓ 23.85 MB ✗ 400 5MB ✓ ✓ 49.46 MB ✗ 413 ✓ ✓ Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through, Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts routes ceiling by provider, unknown provider gets no ceiling. All 52 targeted tests pass. * refactor(image-input): attempt native, shrink-and-retry on provider reject Replace proactive per-provider size ceilings with a reactive shrink path on the provider's actual rejection. All providers now attempt native full-size attachment first; if the provider returns an image-too-large error, the agent silently shrinks and retries once. Why the previous design was wrong: hardcoding provider ceilings (anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image paid no tax, but Anthropic users lost quality on anything >5MB even though the empirical behaviour at provider-reject time is the same (shrink + retry). Baking the table into the routing layer also requires updating Hermes every time a provider's limit changes. Reactive design: - image_routing.py: _file_to_data_url encodes native size, no ceiling. build_native_content_parts drops its provider kwarg. - error_classifier.py: new FailoverReason.image_too_large + pattern match ("image exceeds", "image too large", etc.) checked BEFORE context_overflow so Anthropic's 5MB rejection lands in the right bucket. - run_agent.py: new _try_shrink_image_parts_in_messages walks api messages in-place, re-encodes oversized data: URL image parts through vision_tools._resize_image_for_vision to fit under 4MB, handles both chat.completions (dict image_url) and Responses (string image_url) shapes, ignores http URLs (provider-fetched). New image_shrink_retry_attempted flag in the retry loop fires the shrink exactly once per turn after credential-pool recovery but before auth retries. E2E verified live against Anthropic claude-sonnet-4-6: - 17.9MB PNG (23.9MB b64) attached at native size - Anthropic returns 400 "image exceeds 5 MB maximum" - Agent logs '📐 Image(s) exceeded provider size limit — shrank and retrying...' - Retry succeeds, correct response delivered in 6.8s total. Tests: 12 new (8 shrink-helper shapes + 4 classifier signals), replaces 5 proactive-ceiling tests with 3 simpler 'native attach works' tests. 181 targeted tests pass. test_enum_members_exist in test_error_classifier.py updated for the new enum value.		2026-04-27 06:27:59 -07:00
..
acp	fix(acp): include MCP toolsets in ACP sessions	2026-04-24 03:04:42 -07:00
agent	feat(image-input): native multimodal routing based on model vision capability (#16506 )	2026-04-27 06:27:59 -07:00
cli	fix(cli): eliminate ghost status-bar + DSR input leaks from terminal drift	2026-04-27 05:31:47 -07:00
cron	fix(cron): don't silently disable recurring cron jobs when croniter is missing (#16368 )	2026-04-26 21:47:32 -07:00
e2e	test(discord): add guild to fake e2e messages	2026-04-25 18:25:56 -07:00
environments/benchmarks
fakes
gateway	fix(telegram): accept /cmd@botname from bot menu in groups	2026-04-26 22:00:18 -07:00
hermes_cli	feat(update): auto-backup HERMES_HOME before hermes update (#16539 )	2026-04-27 05:36:19 -07:00
hermes_state	fix(resume): redirect --resume to the descendant that actually holds the messages	2026-04-24 03:04:42 -07:00
honcho_plugin
integration
plugins	feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364 )	2026-04-27 06:22:25 -07:00
run_agent	feat(image-input): native multimodal routing based on model vision capability (#16506 )	2026-04-27 06:27:59 -07:00
skills	fix(skills): honor scope query from Google OAuth redirect URL	2026-04-26 21:08:19 -07:00
tools	fix(file-tools): escalate to BLOCKED on repeated read_file dedup stubs (#16382 )	2026-04-27 00:17:26 -07:00
tui_gateway	Revert "feat(onboarding): port first-touch hints to the TUI (#16054 )" (#16062 )	2026-04-26 06:31:37 -07:00
website	fix(website): auto-wrap ASCII-art code blocks in generated skill pages (#16497 )	2026-04-27 03:38:39 -07:00
__init__.py
conftest.py	test: blank platform-gating env vars in hermetic fixture	2026-04-26 12:23:20 -07:00
run_interrupt_test.py
test_account_usage.py	feat(account-usage): add per-provider account limits module	2026-04-21 01:56:35 -07:00
test_base_url_hostname.py	security(runtime_provider): close OLLAMA_API_KEY substring-leak sweep miss (#13522 )	2026-04-21 06:06:16 -07:00
test_batch_runner_checkpoint.py	test: regression coverage for checkpoint dedup and inf/nan coercion	2026-04-24 14:32:21 -07:00
test_cli_file_drop.py	fix(tui): improve macOS paste and shortcut parity	2026-04-21 08:00:00 -07:00
test_cli_skin_integration.py	fix: align status bar skin tests with upstream main	2026-04-22 13:20:02 -07:00
test_ctx_halving_fix.py
test_empty_model_fallback.py
test_evidence_store.py
test_hermes_constants.py
test_hermes_logging.py	fix(logging): attach gateway log after cli init	2026-04-26 19:01:26 -07:00
test_hermes_state.py	Merge remote-tracking branch 'origin/main' into bb/tui-long-session-perf	2026-04-26 21:07:15 -05:00
test_honcho_client_config.py
test_ipv4_preference.py
test_mcp_serve.py
test_mini_swe_runner.py	fix(kimi): omit temperature entirely for Kimi/Moonshot models (#13157 )	2026-04-20 12:23:05 -07:00
test_minimax_model_validation.py	fix(models): validate MiniMax models against static catalog (#12611 , #12460 , #12399 , #12547 )	2026-04-19 22:44:47 -07:00
test_minisweagent_path.py
test_model_picker_scroll.py
test_model_tools_async_bridge.py	fix(core): ensure non-blocking executor shutdown on async timeout	2026-04-22 14:42:32 -07:00
test_model_tools.py	feat(hooks): add duration_ms to post_tool_call + transform_tool_result (#15429 )	2026-04-25 22:13:12 -07:00
test_ollama_num_ctx.py
test_packaging_metadata.py
test_plugin_skills.py
test_project_metadata.py
test_retry_utils.py
test_sql_injection.py
test_subprocess_home_isolation.py
test_timezone.py
test_toolset_distributions.py
test_toolsets.py	feat(discord): split discord_server into discord + discord_admin tools	2026-04-25 04:50:14 -07:00
test_trajectory_compressor_async.py	fix(kimi): omit temperature entirely for Kimi/Moonshot models (#13157 )	2026-04-20 12:23:05 -07:00
test_trajectory_compressor.py	fix(kimi): omit temperature entirely for Kimi/Moonshot models (#13157 )	2026-04-20 12:23:05 -07:00
test_transform_tool_result_hook.py	test: stop testing mutable data — convert change-detectors to invariants (#13363 )	2026-04-20 23:20:33 -07:00
test_tui_gateway_server.py	fix(tui): restore resumed transcript lineage	2026-04-26 15:16:12 -05:00
test_utils_truthy_values.py
test_yuanbao_integration.py	yuanbao platform (#16298 )	2026-04-26 18:50:49 -07:00
test_yuanbao_markdown.py	yuanbao platform (#16298 )	2026-04-26 18:50:49 -07:00
test_yuanbao_pipeline.py	yuanbao platform (#16298 )	2026-04-26 18:50:49 -07:00
test_yuanbao_proto.py	yuanbao platform (#16298 )	2026-04-26 18:50:49 -07:00