hermes-agent/hermes_cli
Teknium ec671c4154
feat(image-input): native multimodal routing based on model vision capability (#16506)
* feat(image-input): native multimodal routing based on model vision capability

Attach user-sent images as OpenAI-style content parts on the user turn when
the active model supports native vision, so vision-capable models see real
pixels instead of a lossy text description from vision_analyze.

Routing decision (agent/image_routing.py::decide_image_input_mode):

  agent.image_input_mode = auto | native | text  (default: auto)

In auto mode:
  - If auxiliary.vision.provider/model is explicitly configured, keep the
    text pipeline (user paid for a dedicated vision backend).
  - Else if models.dev reports supports_vision=True for the active
    provider/model, attach natively.
  - Else fall back to text (current behaviour).

Call sites updated: gateway/run.py (all messaging platforms), tui_gateway
(dashboard/Ink), cli.py (interactive /attach + drag-drop).

run_agent.py changes:
  - _prepare_anthropic_messages_for_api now passes image parts through
    unchanged when the model supports vision — the Anthropic adapter
    translates them to native image blocks. Previous behaviour
    (vision_analyze → text) only runs for non-vision Anthropic models.
  - New _prepare_messages_for_non_vision_model mirrors the same contract
    for chat.completions and codex_responses paths, so non-vision models
    on any provider get text-fallback instead of failing at the provider.
  - New _model_supports_vision() helper reads models.dev caps.

vision_analyze description rewritten: positions it as a tool for images
NOT already visible in the conversation (URLs, tool output, deeper
inspection). Prevents the model from redundantly calling it on images
already attached natively.

Config default: agent.image_input_mode = auto.

Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py),
all existing tests that reference _prepare_anthropic_messages_for_api
still pass (198 targeted + new tests green).

* feat(image-input): size-cap + resize oversized images, charge image tokens in compressor

Two follow-ups that make the native image routing safer for long / heavy
sessions:

1) Oversize handling in build_native_content_parts:
   - 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES,
     the most restrictive provider — Gemini inline data).
   - Delegates to vision_tools._resize_image_for_vision (Pillow-based,
     already battle-tested) to downscale to 5 MB first-try.
   - If Pillow is missing or resize still overshoots, the image is
     dropped and reported back in skipped[]; caller falls back to text
     enrichment for that image.

2) Image-token accounting in context_compressor:
   - New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant;
     within the realistic range for Anthropic/GPT-4o/Gemini billing).
   - _content_length_for_budget() helper: sums text-part lengths and
     charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/
     input_image part.  Base64 payload inside image_url is NOT counted
     as chars — dimensions don't matter, only image-presence.
   - Both tail-cut sites (_prune_old_tool_results L527 and
     _find_tail_cut_by_tokens L1126) now call the helper so multi-image
     conversations don't slip past compression budget.

Tests: 9 new in test_image_routing.py (oversize triggers resize,
resize-fails-returns-None, oversize-skipped-reported), 11 new in
test_compressor_image_tokens.py (flat charge per image, multiple images,
Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on
raw base64, bounds-check on the constant, integration test that an
image-heavy tail actually gets trimmed).

* fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits

The previous commit imposed a hardcoded 20 MB base64 ceiling on all
providers, triggering auto-resize on anything larger. This was wrong in
both directions:

  * Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400
    'image exceeds 5 MB maximum' above that).
  * Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without
    complaint (empirically verified April 2026 with progressive PNG
    sizes).

New behaviour:

  * _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a
    ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder).
  * Providers NOT in the table get no ceiling — images attach at native
    size and we trust the provider to return its own error if it
    disagrees. A provider-specific 400 message is clearer than us
    guessing wrong and silently degrading image quality.
  * build_native_content_parts() gains a keyword-only provider arg;
    gateway/CLI/TUI pass the active provider so Anthropic users get
    auto-resize protection while OpenAI users don't pay it.
  * Resize target dropped from 5 MB to 4 MB to slide safely under
    Anthropic's boundary with header overhead.

Empirical measurements (direct API, no Hermes in the loop):

    image b64     anthropic   openrouter/gpt5.5   codex-oauth/gpt5.5
    0.19 MB       ✓           ✓                   ✓
    12.37 MB      ✗ 400 5MB   ✓                   ✓
    23.85 MB      ✗ 400 5MB   ✓                   ✓
    49.46 MB      ✗ 413       ✓                   ✓

Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through,
Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts
routes ceiling by provider, unknown provider gets no ceiling. All 52
targeted tests pass.

* refactor(image-input): attempt native, shrink-and-retry on provider reject

Replace proactive per-provider size ceilings with a reactive shrink path
on the provider's actual rejection. All providers now attempt native
full-size attachment first; if the provider returns an image-too-large
error, the agent silently shrinks and retries once.

Why the previous design was wrong: hardcoding provider ceilings
(anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image
paid no tax, but Anthropic users lost quality on anything >5MB even
though the empirical behaviour at provider-reject time is the same
(shrink + retry). Baking the table into the routing layer also
requires updating Hermes every time a provider's limit changes.

Reactive design:
  - image_routing.py: _file_to_data_url encodes native size, no ceiling.
    build_native_content_parts drops its provider kwarg.
  - error_classifier.py: new FailoverReason.image_too_large + pattern
    match ("image exceeds", "image too large", etc.) checked BEFORE
    context_overflow so Anthropic's 5MB rejection lands in the right
    bucket.
  - run_agent.py: new _try_shrink_image_parts_in_messages walks api
    messages in-place, re-encodes oversized data: URL image parts
    through vision_tools._resize_image_for_vision to fit under 4MB,
    handles both chat.completions (dict image_url) and Responses
    (string image_url) shapes, ignores http URLs (provider-fetched).
    New image_shrink_retry_attempted flag in the retry loop fires the
    shrink exactly once per turn after credential-pool recovery but
    before auth retries.

E2E verified live against Anthropic claude-sonnet-4-6:
  - 17.9MB PNG (23.9MB b64) attached at native size
  - Anthropic returns 400 "image exceeds 5 MB maximum"
  - Agent logs '📐 Image(s) exceeded provider size limit — shrank and
    retrying...'
  - Retry succeeds, correct response delivered in 6.8s total.

Tests: 12 new (8 shrink-helper shapes + 4 classifier signals),
replaces 5 proactive-ceiling tests with 3 simpler 'native attach works'
tests. 181 targeted tests pass. test_enum_members_exist in
test_error_classifier.py updated for the new enum value.
2026-04-27 06:27:59 -07:00
..
__init__.py chore: release v0.11.0 (2026.4.23) (#14791) 2026-04-23 15:31:59 -07:00
auth_commands.py Add native Spotify tools with PKCE auth 2026-04-24 05:20:38 -07:00
auth.py fix(auth): hoist get_env_value import + strengthen .env fallback tests 2026-04-26 08:32:09 -07:00
azure_detect.py feat(azure-foundry): auto-detect transport, models, context length 2026-04-25 18:48:43 -07:00
backup.py feat(update): auto-backup HERMES_HOME before hermes update (#16539) 2026-04-27 05:36:19 -07:00
banner.py feat(banner): hyperlink startup banner title to latest GitHub release (#14945) 2026-04-23 23:28:34 -07:00
callbacks.py fix: ESC cancels secret/sudo prompts, clearer skip messaging (#9902) 2026-04-14 16:11:37 -07:00
claw.py Normalize claw workspace paths for Windows 2026-04-22 18:15:27 -07:00
cli_output.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
clipboard.py feat: fix img pasting in new ink plus newline after tools 2026-04-11 13:14:32 -05:00
codex_models.py feat(codex): add gpt-5.5 and wire live model discovery into picker (#14720) 2026-04-23 13:32:43 -07:00
colors.py feat: respect NO_COLOR env var and TERM=dumb (#4079) 2026-03-30 17:07:21 -07:00
commands.py fix(cli): eliminate ghost status-bar + DSR input leaks from terminal drift 2026-04-27 05:31:47 -07:00
completion.py fix: preserve profile name completion in dynamic shell completion 2026-04-14 10:45:42 -07:00
config.py feat(image-input): native multimodal routing based on model vision capability (#16506) 2026-04-27 06:27:59 -07:00
copilot_auth.py fix(copilot): exchange raw GitHub token for Copilot API JWT 2026-04-24 05:09:08 -07:00
cron.py feat(cron): per-job workdir for project-aware cron runs (#15110) 2026-04-24 05:07:01 -07:00
curses_ui.py feat: ungate Tool Gateway — subscription-based access with per-tool opt-in 2026-04-16 12:36:49 -07:00
debug.py fix(debug): sweep expired paste.rs uploads on a real timer (#16431) 2026-04-27 00:36:33 -07:00
default_soul.py fix: reset default SOUL.md to baseline identity text (#3159) 2026-03-26 01:34:27 -07:00
dingtalk_auth.py test(dingtalk): cover QR device-flow auth + OpenClaw branding disclosure 2026-04-17 05:08:07 -07:00
doctor.py fix(doctor): accept bare custom provider 2026-04-25 18:01:36 -07:00
dump.py fix(gemini): fail fast on missing API key + surface it in hermes dump (#15133) 2026-04-24 05:35:17 -07:00
env_loader.py fix(cli): ensure project .env is sanitized before loading 2026-04-22 05:51:44 -07:00
fallback_cmd.py feat(cli): add 'hermes fallback' command to manage fallback providers (#16052) 2026-04-26 06:19:04 -07:00
gateway.py yuanbao platform (#16298) 2026-04-26 18:50:49 -07:00
hooks.py feat(hooks): add duration_ms to post_tool_call + transform_tool_result (#15429) 2026-04-25 22:13:12 -07:00
logs.py feat: component-separated logging with session context and filtering (#7991) 2026-04-11 17:23:36 -07:00
main.py feat(update): auto-backup HERMES_HOME before hermes update (#16539) 2026-04-27 05:36:19 -07:00
mcp_config.py fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383) 2026-04-16 21:57:10 -07:00
memory_setup.py fix(memory): discover user-installed memory providers from $HERMES_HOME/plugins/ (#10529) 2026-04-15 14:25:40 -07:00
model_catalog.py feat(models): remote model catalog manifest for OpenRouter + Nous Portal (#16033) 2026-04-26 05:46:43 -07:00
model_normalize.py fix(model-normalize): pass DeepSeek V-series IDs through instead of folding to deepseek-chat 2026-04-24 05:24:54 -07:00
model_switch.py fix(context): honor custom_providers context_length on /model switch + bump probe tier to 256K (#15844) 2026-04-25 18:47:53 -07:00
models.py fix(azure-foundry): auto-route gpt-5.x / codex / o-series to Responses API (#16361) 2026-04-26 21:33:31 -07:00
nous_subscription.py fix(cli): coerce use_gateway config flags in tool routing 2026-04-26 19:02:55 -07:00
oneshot.py feat(oneshot): add --model / --provider / HERMES_INFERENCE_MODEL (#15704) 2026-04-25 08:55:36 -07:00
pairing.py fix(pairing): handle null user_name in pairing list display 2026-04-23 02:34:11 -07:00
platforms.py yuanbao platform (#16298) 2026-04-26 18:50:49 -07:00
plugins_cmd.py feat(plugins): make all plugins opt-in by default 2026-04-20 04:46:45 -07:00
plugins.py docs(plugins): correct pre_gateway_dispatch doc text and add hooks.md section 2026-04-24 03:02:03 -07:00
profiles.py fix(profiles): stage profile imports to prevent directory clobbering 2026-04-23 03:02:34 -07:00
providers.py feat: Add Azure Foundry provider with OpenAI/Anthropic API mode selection 2026-04-25 18:48:43 -07:00
pty_bridge.py fix: mobile chat in new layout 2026-04-24 12:07:46 -04:00
runtime_provider.py fix(azure-foundry): auto-route gpt-5.x / codex / o-series to Responses API (#16361) 2026-04-26 21:33:31 -07:00
setup.py yuanbao platform (#16298) 2026-04-26 18:50:49 -07:00
skills_config.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
skills_hub.py feat(skills): install skills from a direct HTTP(S) URL (#16323) 2026-04-26 20:57:10 -07:00
skin_engine.py fix(skins): don't inherit status_bar_* into light-mode skins 2026-04-22 13:20:02 -07:00
slack_cli.py feat(slack): register every gateway command as a native slash (Discord/Telegram parity) (#16164) 2026-04-26 11:38:32 -07:00
status.py yuanbao platform (#16298) 2026-04-26 18:50:49 -07:00
timeouts.py refactor(timeouts): drop redundant ImportError in except clause 2026-04-26 20:48:20 -07:00
tips.py feat(busy): add 'steer' as a third display.busy_input_mode option (#16279) 2026-04-26 18:21:29 -07:00
tools_config.py fix(tools): coerce quoted use_gateway in image_gen UI detection 2026-04-26 19:02:55 -07:00
uninstall.py feat(uninstall): offer to remove named profiles when uninstalling from default 2026-04-18 19:18:13 -07:00
voice.py fix(tui): ignore SIGPIPE so stderr back-pressure can't kill the gateway 2026-04-23 16:18:15 -07:00
web_server.py fix(tui): run built TUI with production React by default 2026-04-26 21:34:31 -05:00
webhook.py feat(webhook): direct delivery mode for zero-LLM push notifications (#12473) 2026-04-19 05:18:19 -07:00