hermes-agent/agent
ddupont 413ee1a286 feat(computer-use): background focus-safe backend — set_value, structured windows, MIME detection
Extends the cua-driver computer-use backend to drive backgrounded macOS
windows without stealing keyboard or mouse focus from the foreground app.
All changes target the cua-driver MCP backend and the shared dispatcher.

## cua_backend.py

**Window-aware capture**: capture() now calls list_windows + get_window_state
instead of the removed capture tool. Prefers structuredContent.windows
(MCP 2024-11-05+ cua-driver) for zero-parse window enumeration; falls back
to regex-parsed text for older builds. Stores the selected (pid, window_id)
as sticky context so subsequent action calls do not need a redundant round-trip.

**Action routing**: click/scroll/type_text/key all carry the sticky pid
(and window_id for element-indexed clicks). type_text routes through
type_text_chars (individual key events) rather than AX attribute write --
WebKit AXTextFields reject attribute writes from backgrounded processes.

**Key parsing**: _parse_key_combo splits cmd+s-style strings into
(key, [modifiers]) and routes to hotkey (modifier present) or
press_key (bare key) -- cua-driver actual tool names.

**set_value method**: new set_value(value, element) calls the cua-driver
set_value MCP tool. For AXPopUpButton / HTML select in a backgrounded Safari,
AXPress opens the native macOS popup which closes immediately when the app is
non-frontmost; set_value AX-presses the matching child option directly
(no menu required, no focus steal).

**focus_app**: reimplemented as a pure window-selector (enumerates
list_windows, sets sticky pid/window_id) without ever raising the window
or stealing focus.

**list_apps**: fixed tool name from listApps to list_apps; handles plain-text
response via regex when structured data is absent.

**Structured-content extraction**: _extract_tool_result now surfaces
structuredContent from MCP results, enabling the list_windows window array
without text parsing.

**Helpers**: _parse_windows_from_text, _parse_elements_from_tree,
_split_tree_text, _parse_key_combo extracted as module-level functions.

## schema.py

Added set_value to the action enum with a description explaining when to
prefer it over click (select/popup elements, sliders, no focus steal).
Added value field for set_value payloads.

## tool.py

Routed set_value action through _dispatch to backend.set_value.
Added set_value to _DESTRUCTIVE_ACTIONS (approval-gated).
Fixed MIME-type detection in _capture_response: cua-driver may return
JPEG; detect from base64 magic bytes (/9j/ -> image/jpeg, else image/png)
rather than hardcoding image/png.

## agent/display.py + run_agent.py

Guard _detect_tool_failure and result-preview logic against non-string
function_result values: multimodal tool results (dicts with _multimodal=True)
are not string-sliceable; treat them as successes and fall back to str()
for length/preview.
2026-04-28 01:46:36 -07:00
..
transports fix(agent): preserve Codex message items for replay 2026-04-25 18:22:06 -07:00
__init__.py Refactor Terminal and AIAgent cleanup 2026-02-21 22:31:43 -08:00
account_usage.py feat(account-usage): add per-provider account limits module 2026-04-21 01:56:35 -07:00
anthropic_adapter.py feat(computer-use): cua-driver backend, universal any-model schema 2026-04-28 01:46:36 -07:00
auxiliary_client.py fix(providers/gmi): post-salvage review fixes 2026-04-27 11:17:59 -07:00
bedrock_adapter.py fix(bedrock): evict cached boto3 client on stale-connection errors 2026-04-24 07:26:07 -07:00
codex_responses_adapter.py fix(agent): preserve Codex message items for replay 2026-04-25 18:22:06 -07:00
context_compressor.py feat(computer-use): cua-driver backend, universal any-model schema 2026-04-28 01:46:36 -07:00
context_engine.py fix(compress): don't reach into ContextCompressor privates from /compress (#15039) 2026-04-24 02:55:43 -07:00
context_references.py fix(agent): fall back when rg is blocked for @folder references 2026-04-20 01:56:41 -07:00
copilot_acp_client.py fix: set HOME for Copilot ACP subprocesses 2026-04-24 05:09:08 -07:00
credential_pool.py fix(auth): hoist get_env_value import + strengthen .env fallback tests 2026-04-26 08:32:09 -07:00
credential_sources.py fix(auth): unify credential source removal — every source sticks (#13427) 2026-04-21 01:52:49 -07:00
display.py feat(computer-use): background focus-safe backend — set_value, structured windows, MIME detection 2026-04-28 01:46:36 -07:00
error_classifier.py feat(image-input): native multimodal routing based on model vision capability (#16506) 2026-04-27 06:27:59 -07:00
file_safety.py fix(security): apply file safety to copilot acp fs 2026-04-21 01:31:58 -07:00
gemini_cloudcode_adapter.py refactor: remove redundant local imports already available at module level 2026-04-21 00:50:58 -07:00
gemini_native_adapter.py fix(gemini): fail fast on missing API key + surface it in hermes dump (#15133) 2026-04-24 05:35:17 -07:00
gemini_schema.py fix(gemini): drop integer/number/boolean enums from tool schemas (#15082) 2026-04-24 03:40:00 -07:00
google_code_assist.py fix(gemini-cli): surface MODEL_CAPACITY_EXHAUSTED cleanly + drop retired gemma-4-26b (#11833) 2026-04-17 15:34:12 -07:00
google_oauth.py feat(gemini): add Google Gemini CLI OAuth provider via Cloud Code Assist (free + paid tiers) (#11270) 2026-04-16 16:49:00 -07:00
image_gen_provider.py feat(plugins): pluggable image_gen backends + OpenAI provider (#13799) 2026-04-21 21:30:10 -07:00
image_gen_registry.py feat(plugins): pluggable image_gen backends + OpenAI provider (#13799) 2026-04-21 21:30:10 -07:00
image_routing.py feat(image-input): native multimodal routing based on model vision capability (#16506) 2026-04-27 06:27:59 -07:00
insights.py Merge branch 'main' into feat/dashboard-skill-analytics 2026-04-20 05:25:49 -07:00
manual_compression_feedback.py fix(gateway): make manual compression feedback truthful 2026-04-10 21:16:53 -07:00
memory_manager.py style: trim verbose comment blocks added by previous commit 2026-04-27 12:37:33 -07:00
memory_provider.py fix(memory): add write origin metadata 2026-04-24 14:37:55 -07:00
model_metadata.py feat(computer-use): cua-driver backend, universal any-model schema 2026-04-28 01:46:36 -07:00
models_dev.py fix: normalize provider in list_provider_models to support aliases 2026-04-23 01:59:20 -07:00
moonshot_schema.py fix(kimi,mcp): Moonshot schema sanitizer + MCP schema robustness (#14805) 2026-04-23 16:11:57 -07:00
nous_rate_guard.py fix(nous): don't trip cross-session rate breaker on upstream-capacity 429s (#15898) 2026-04-26 04:53:42 -07:00
onboarding.py fix(openclaw-migration): case-preserving brand rewrite + one-time ~/.openclaw residue banner (#16327) 2026-04-26 20:57:26 -07:00
prompt_builder.py feat(computer-use): cua-driver backend, universal any-model schema 2026-04-28 01:46:36 -07:00
prompt_caching.py fix(prompt-caching): skip top-level cache_control on role:tool for OpenRouter 2026-03-21 16:54:43 -07:00
rate_limit_tracker.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
redact.py feat(security): make secret redaction off by default (#16794) 2026-04-27 21:24:08 -07:00
retry_utils.py feat(agent): add jittered retry backoff 2026-04-08 00:41:36 -07:00
shell_hooks.py fix(shell_hooks): parse hooks_auto_accept as strict bool/string, not bool() (#16322) 2026-04-26 20:48:35 -07:00
skill_commands.py fix(prompts): replace [SYSTEM: with [IMPORTANT: to avoid Azure content filter 2026-04-26 08:44:58 -07:00
skill_preprocessing.py fix(skills): apply inline shell in skill_view 2026-04-24 15:15:07 -07:00
skill_utils.py fix(skills): follow symlinks in iter_skill_index_files 2026-04-22 17:43:30 -07:00
subdirectory_hints.py fix(agent): catch PermissionError in subdirectory hint discovery 2026-04-09 03:10:30 -07:00
title_generator.py fix(title-gen): surface auxiliary failures via _emit_auxiliary_failure 2026-04-26 21:49:34 -07:00
trajectory.py Refactor Terminal and AIAgent cleanup 2026-02-21 22:31:43 -08:00
usage_pricing.py fix(usage): read top-level Anthropic cache fields from OAI-compatible proxies 2026-04-22 17:40:49 -07:00