A coordinate-grounded visual inspection protocol for AI coding agents.
中文文档 | English
A programmable visual toolkit for AI agents. Inspect, navigate, crop, and sample large images — every command returns structured JSON, and generated views include coordinate mappings back to the source image.
$ vistools inspect screenshot.png
{
"ok": true,
"data": {
"source": { "width": 3200, "height": 2400, "format": "png", "size_bytes": 808243 },
"suggestion": {
"needs_overview": true,
"max_tile_rows": 2,
"max_tile_cols": 3,
"recommended_next": "overview",
"reason": "long side 3200 exceeds 1568 visual threshold",
"suggested_max_side": 1568
}
}
}
Without vistools:
Agent reads a 3200×2400 screenshot → details lost in compression
→ claims "the button looks correct" → no way to verify
With vistools:
1. inspect → 3200×2400, needs overview
2. overview --max-side 1200 → scaled preview, scale_factor = 0.375
3. Spot anomaly at overview (800, 600) → map to source: (2133, 1600)
4. viewport rect → exact crop of the region
5. sample --x 2133 --y 1600 → color is #e74c3c, not expected #2563eb
6. Report: "Button at source (2133, 1600) has incorrect background color"
When an AI agent (Claude Code, Cursor, Codex, a browser agent) is handed a large screenshot or design file, it usually sees the whole thing at once — compressed, zoomed out, and too expensive to process at full resolution. vistools gives agents the same tools a human uses: look at the overview, pick a region of interest, zoom in, read the details.
Three design choices drive everything:
- JSON-first. Every command outputs a
CommandResult<T>envelope — success or failure — so agents can parse the same shape every time. - Coordinate mapping. Every generated view includes a
coordinate_mappingthat describes how to translate output coordinates back to the source. An agent that finds a button in a crop knows exactly where it lives in the original. - Agent-safe. The source file is never touched. Paths are sandboxed (no
..escape). Pixel limits (100 MP) and tile limits (64) keep runaway calls from blowing up.
# In Claude Code:
/plugin install https://github.com/ZeroZ-lab/vistools-skills
# Then: /vistools screenshot.pnggit clone https://github.com/ZeroZ-lab/vistools
cd vistools
cargo install --path crates/cli # installs to ~/.cargo/bin/vistoolsOr build and run directly:
cargo build --release
./target/release/vistools <command>The release binary is a single ~5 MB executable with no runtime dependencies.
The first thing to call on any unknown image. Reads only the header, so it's sub-millisecond.
vistools inspect large_screenshot.pngWhen the long side exceeds 1568 px (Claude's visual-model threshold), suggestion.recommended_next is overview; otherwise it is direct. max_tile_rows/max_tile_cols tell you how fine a grid to use if you need full coverage.
vistools overview large_screenshot.png overview.png --max-side 1200Shrinks so the longest side fits max_side, preserves aspect ratio, and returns the scale_factor so you can map clicks in the overview back to the source.
vistools tile large_screenshot.png --rows 2 --cols 3 --out-dir ./tilesProduces row-N-col-M.<ext> files. The last tile in each row/column absorbs the remainder pixels, so the tiles always cover the source exactly.
Three modes, same output shape:
# Anchor-based (nine-position: top-left, center, bottom-right, ...)
vistools viewport anchor src.png crop.png --anchor top-right --width 800 --height 600
# Percentage-based (fractions of the source)
vistools viewport percent src.png crop.png --x 0.3 --y 0.3 --w 0.4 --h 0.4
# Pixel rectangle
vistools viewport rect src.png crop.png --x 1100 --y 200 --width 700 --height 700Percent mode is strict: x/y/w/h must stay within 0..1, and x + w / y + h must not exceed 1.
# Point color
vistools sample src.png --x 120 --y 80
# Average color and alpha stats for a region
vistools sample src.png --rect 100,80,40,40Point mode returns rgba, rgb, lowercase hex, and alpha. Rect mode returns the rounded average color, alpha_stats (min, max, average, transparent_ratio), and pixel_count. sample is read-only and does not create an output image.
# Compare full images
vistools diff expected.png actual.png
# Compare the same source-coordinate region in both images
vistools diff expected.png actual.png --rect 100,80,400,300diff is read-only and returns pixel_count, changed_pixels, changed_ratio, mean_delta, max_delta, and an optional bounding_rect for changed pixels. The initial implementation requires both images to have the same dimensions and does not generate a diff image.
These commands are read-only and return structured JSON for photographic inspection:
vistools histogram src.jpg --rgb
vistools zone-map src.jpg
vistools exposure src.jpg --mode evaluative
vistools focus-map src.jpg --rows 3 --cols 4
vistools white-balance src.jpghistogram --rgbadds per-channel R/G/B histograms without changing the default luminance-only output.zone-mapmaps luminance into Zone System0..X, with per-zone ratios and representative source coordinates.exposureestimatesev, supportsevaluative,spot,center-weighted, andhighlight-weighted, and classifies the result asunder/correct/over.focus-mapsplits the image or--rectregion into anN x Mgrid and returns per-cell sharpness, thebest_cell, and afocus_pointyou can drill into withviewport.white-balanceestimates gray-world R/G/B gains and reportswarm/coolandgreen/magentabias without outputting a corrected image.
vistools --help # list all commands with brief description
vistools inspect --help # detailed help for a subcommand
vistools --version # print version (e.g. "vistools 0.2.0")Every command — success or failure — returns the same envelope on stdout:
{
"ok": true,
"operation": "viewport",
"input": "src.png",
"data": {
"output": "crop.png",
"source": { "width": 3200, "height": 2400, "format": "png", "size_bytes": 808243 },
"crop": {
"mode": "anchor",
"region": { "x": 2200, "y": 0, "width": 1000, "height": 600 },
"params": { "anchor": "TopRight", "width": 1000, "height": 600 }
},
"result": { "width": 1000, "height": 600 },
"coordinate_mapping": {
"crop_origin_in_source": [2200, 0],
"scale_factor": null,
"formula": "source_x = result_x + 2200, source_y = result_y"
}
},
"warnings": [],
"elapsed_ms": 12
}On failure, ok is false, data is absent, and error carries a stable machine-readable code:
{
"ok": false,
"operation": "inspect",
"input": "/tmp/nope.png",
"error": { "code": "FILE_NOT_FOUND", "message": "input file not found: /tmp/nope.png" },
"warnings": [],
"elapsed_ms": 0
}The process also exits non-zero on failure.
| Code | Meaning |
|---|---|
FILE_NOT_FOUND |
Input file does not exist or is not a regular file |
UNSUPPORTED_FORMAT |
Image decoder could not read the file |
INVALID_DIMENSIONS |
Zero width/height passed to a command |
INVALID_COORDINATES |
Viewport/sample point or rect exceeds source bounds |
INVALID_PARAMETERS |
Tile count > 64, zero max side, malformed sample mode, etc. |
OUTPUT_WRITE_ERROR |
Could not write the output file |
PATH_ESCAPE |
Path contains .. |
OUTPUT_SAME_AS_INPUT |
Output would overwrite the source |
PIXEL_LIMIT_EXCEEDED |
Source exceeds 100 megapixels |
1. inspect src.png # big image? what's the suggested grid?
│
▼ needs_overview=true
2. overview src.png overview.png --max-side 1200
│
▼ find region of interest in the overview
3a. tile src.png --rows 2 --cols 3 --out-dir ./tiles
│
▼ or, if you know the area:
3b. viewport anchor src.png crop.png --anchor top-right --width 800 --height 600
│
▼ coordinate_mapping tells you where (100, 50) in the crop lives in src.png
4. sample src.png --x 1110 --y 800
│
▼ inspect the exact color/alpha at the source coordinate
5. agent acts on the crop
The coordinate_mapping.formula string is the machine-readable recipe:
source_x = result_x + 2200, source_y = result_y # crop
source_x = result_x / 0.375000, source_y = result_y / 0.375000 # overview
Skills are maintained in a separate repo: ZeroZ-lab/vistools-skills.
# Claude Code — install from the skills-only repo
/plugin install https://github.com/ZeroZ-lab/vistools-skills
# Then use: /vistools screenshot.pngSupports Claude Code, Cursor, and Codex.
cargo build # debug
cargo build --release # release (~5 MB, LTO + stripped)
cargo test # unit + integration
cargo clippy --all-targets -- -D warnings
cargo fmt --checkSupported input formats: PNG, JPEG, WebP, TIFF, BMP, GIF. Output format is inferred from the output file's extension.
vistools/
├── crates/
│ ├── core/ # library: types, guard, coord, one module per command
│ └── cli/ # thin clap wrapper + integration tests
├── fixtures/ # unit-test images (64x64, 256x256, 1000x1000)
│ └── e2e/ # real-world test images
└── docs/ # design decisions (project.md), timeline, contracts
MIT / Apache-2.0, at your option.