fix(desktop): pin computer-use to 1:1 screenshot/click coords (core#2200) #89
Reference in New Issue
Block a user
Delete Branch "fix/2200-desktop-coord-1to1"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Resolves the runtime half of core#2200 — computer-use desktop agents SEE a target in the screenshot but MISS the click.
Root cause
tool_desktop_screenshotcaptures the display at native pixels (scrot, no resize) andtool_desktop_clickclicks at native pixels (xdotool mousemove). Those two spaces are identical only if the model reasons over the screenshot at those same pixels. Claude's vision silently downscales any image above ~1.15 MP / 1568px long edge before the model sees it, so the 1920x1080 (2.07 MP) desktop desyncs screenshot-space from click-space — the model reads a coordinate off the downscaled image, xdotool clicks it in full-res space, and the click lands elsewhere. (The agent in the incident even guessed 2:1 / 4:1 scale factors and still missed.)Fix (this PR = runtime side)
tool_desktop_screenshotnow returns the exact pixel space (width,height,vision_safe) by parsing the PNG IHDR (no image lib). The agent never has to infer DPI: the coordinates it reads are the coordinatestool_desktop_clickconsumes.warninginstead of letting clicks silently miss._size_browser_windowsizes to 1280x800 (WXGA), matching the resolution the provisioner pins.The companion control-plane PR pins the Xvfb
:99display to 1280x800 (Anthropic's recommended computer-use resolution: 1.02 MP, 1280<1568 → no downscale → screenshot(x,y) == click(x,y) 1:1).Tests
_png_dimensionsparses IHDR / rejects non-PNG / missing filewidth/height+vision_safe: trueat 1280x800 (no warning)vision_safe: false+ warning at 1920x108014 passed.Follow-up (noted, not blocking): full auto downscale-to-safe + click scale-back for arbitrary display sizes (needs ImageMagick on the host).
Approve. RCA is correct — 1920x1080 screenshots exceed Claude vision downscale bound, desyncing pixel/click space. Surfacing width/height/vision_safe from tool_desktop_screenshot is the right fix; PNG IHDR parse + guard are sound. Tests cover both clauses incl boundary.
QA approve. 16 unit tests pass; boundary cases for each vision_safe clause added; browser-window 1280x800 assertion updated; JSON-contract (ok:true + path) preserved. Pairs with CP#516 display pin.