vistools

A coordinate-grounded visual inspection protocol for AI coding agents.

A programmable visual toolkit for AI agents. Inspect, navigate, crop, and sample large images — every command returns structured JSON, and generated views include coordinate mappings back to the source image.

$ vistools inspect screenshot.png
{
  "ok": true,
  "data": {
    "source": { "width": 3200, "height": 2400, "format": "png", "size_bytes": 808243 },
    "suggestion": {
      "needs_overview": true,
      "max_tile_rows": 2,
      "max_tile_cols": 3,
      "recommended_next": "overview",
      "reason": "long side 3200 exceeds 1568 visual threshold",
      "suggested_max_side": 1568
    }
  }
}

Before vs After

Without vistools:

Agent reads a 3200×2400 screenshot → details lost in compression
  → claims "the button looks correct" → no way to verify

With vistools:

1. inspect → 3200×2400, needs overview
2. overview --max-side 1200 → scaled preview, scale_factor = 0.375
3. Spot anomaly at overview (800, 600) → map to source: (2133, 1600)
4. viewport rect → exact crop of the region
5. sample --x 2133 --y 1600 → color is #e74c3c, not expected #2563eb
6. Report: "Button at source (2133, 1600) has incorrect background color"

Why

When an AI agent (Claude Code, Cursor, Codex, a browser agent) is handed a large screenshot or design file, it usually sees the whole thing at once — compressed, zoomed out, and too expensive to process at full resolution. vistools gives agents the same tools a human uses: look at the overview, pick a region of interest, zoom in, read the details.

Three design choices drive everything:

JSON-first. Every command outputs a CommandResult<T> envelope — success or failure — so agents can parse the same shape every time.
Coordinate mapping. Every generated view includes a coordinate_mapping that describes how to translate output coordinates back to the source. An agent that finds a button in a crop knows exactly where it lives in the original.
Agent-safe. The source file is never touched. Paths are sandboxed (no .. escape). Pixel limits (100 MP) and tile limits (64) keep runaway calls from blowing up.

Install

Claude Code Plugin (recommended)

# In Claude Code:
/plugin install https://github.com/ZeroZ-lab/vistools-skills
# Then: /vistools screenshot.png

From source (Rust 1.88+)

git clone https://github.com/ZeroZ-lab/vistools
cd vistools
cargo install --path crates/cli   # installs to ~/.cargo/bin/vistools

Or build and run directly:

cargo build --release
./target/release/vistools <command>

The release binary is a single ~5 MB executable with no runtime dependencies.

Commands

`inspect` — metadata + strategy hint

The first thing to call on any unknown image. Reads only the header, so it's sub-millisecond.

vistools inspect large_screenshot.png

When the long side exceeds 1568 px (Claude's visual-model threshold), suggestion.recommended_next is overview; otherwise it is direct. max_tile_rows/max_tile_cols tell you how fine a grid to use if you need full coverage.

`overview` — scaled-down preview

vistools overview large_screenshot.png overview.png --max-side 1200

Shrinks so the longest side fits max_side, preserves aspect ratio, and returns the scale_factor so you can map clicks in the overview back to the source.

`tile` — grid split

vistools tile large_screenshot.png --rows 2 --cols 3 --out-dir ./tiles

Produces row-N-col-M.<ext> files. The last tile in each row/column absorbs the remainder pixels, so the tiles always cover the source exactly.

`viewport` — crop a region

Three modes, same output shape:

# Anchor-based (nine-position: top-left, center, bottom-right, ...)
vistools viewport anchor src.png crop.png --anchor top-right --width 800 --height 600

# Percentage-based (fractions of the source)
vistools viewport percent src.png crop.png --x 0.3 --y 0.3 --w 0.4 --h 0.4

# Pixel rectangle
vistools viewport rect src.png crop.png --x 1100 --y 200 --width 700 --height 700

Percent mode is strict: x/y/w/h must stay within 0..1, and x + w / y + h must not exceed 1.

`sample` — point and region color picker

# Point color
vistools sample src.png --x 120 --y 80

# Average color and alpha stats for a region
vistools sample src.png --rect 100,80,40,40

Point mode returns rgba, rgb, lowercase hex, and alpha. Rect mode returns the rounded average color, alpha_stats (min, max, average, transparent_ratio), and pixel_count. sample is read-only and does not create an output image.

`diff` — compare two images

# Compare full images
vistools diff expected.png actual.png

# Compare the same source-coordinate region in both images
vistools diff expected.png actual.png --rect 100,80,400,300

diff is read-only and returns pixel_count, changed_pixels, changed_ratio, mean_delta, max_delta, and an optional bounding_rect for changed pixels. The initial implementation requires both images to have the same dimensions and does not generate a diff image.

Photography metrics

These commands are read-only and return structured JSON for photographic inspection:

vistools histogram src.jpg --rgb
vistools zone-map src.jpg
vistools exposure src.jpg --mode evaluative
vistools focus-map src.jpg --rows 3 --cols 4
vistools white-balance src.jpg

histogram --rgb adds per-channel R/G/B histograms without changing the default luminance-only output.
zone-map maps luminance into Zone System 0..X, with per-zone ratios and representative source coordinates.
exposure estimates ev, supports evaluative, spot, center-weighted, and highlight-weighted, and classifies the result as under / correct / over.
focus-map splits the image or --rect region into an N x M grid and returns per-cell sharpness, the best_cell, and a focus_point you can drill into with viewport.
white-balance estimates gray-world R/G/B gains and reports warm / cool and green / magenta bias without outputting a corrected image.

Help & version

vistools --help              # list all commands with brief description
vistools inspect --help      # detailed help for a subcommand
vistools --version           # print version (e.g. "vistools 0.2.0")

JSON output

Every command — success or failure — returns the same envelope on stdout:

{
  "ok": true,
  "operation": "viewport",
  "input": "src.png",
  "data": {
    "output": "crop.png",
    "source": { "width": 3200, "height": 2400, "format": "png", "size_bytes": 808243 },
    "crop": {
      "mode": "anchor",
      "region": { "x": 2200, "y": 0, "width": 1000, "height": 600 },
      "params": { "anchor": "TopRight", "width": 1000, "height": 600 }
    },
    "result": { "width": 1000, "height": 600 },
    "coordinate_mapping": {
      "crop_origin_in_source": [2200, 0],
      "scale_factor": null,
      "formula": "source_x = result_x + 2200, source_y = result_y"
    }
  },
  "warnings": [],
  "elapsed_ms": 12
}

On failure, ok is false, data is absent, and error carries a stable machine-readable code:

{
  "ok": false,
  "operation": "inspect",
  "input": "/tmp/nope.png",
  "error": { "code": "FILE_NOT_FOUND", "message": "input file not found: /tmp/nope.png" },
  "warnings": [],
  "elapsed_ms": 0
}

The process also exits non-zero on failure.

Error codes

Code	Meaning
`FILE_NOT_FOUND`	Input file does not exist or is not a regular file
`UNSUPPORTED_FORMAT`	Image decoder could not read the file
`INVALID_DIMENSIONS`	Zero width/height passed to a command
`INVALID_COORDINATES`	Viewport/sample point or rect exceeds source bounds
`INVALID_PARAMETERS`	Tile count > 64, zero max side, malformed sample mode, etc.
`OUTPUT_WRITE_ERROR`	Could not write the output file
`PATH_ESCAPE`	Path contains `..`
`OUTPUT_SAME_AS_INPUT`	Output would overwrite the source
`PIXEL_LIMIT_EXCEEDED`	Source exceeds 100 megapixels

Typical agent workflow

1. inspect src.png            # big image? what's the suggested grid?
       │
       ▼  needs_overview=true
2. overview src.png overview.png --max-side 1200
       │
       ▼  find region of interest in the overview
3a. tile src.png --rows 2 --cols 3 --out-dir ./tiles
       │
       ▼  or, if you know the area:
3b. viewport anchor src.png crop.png --anchor top-right --width 800 --height 600
       │
       ▼  coordinate_mapping tells you where (100, 50) in the crop lives in src.png
4. sample src.png --x 1110 --y 800
       │
       ▼  inspect the exact color/alpha at the source coordinate
5. agent acts on the crop

The coordinate_mapping.formula string is the machine-readable recipe:

source_x = result_x + 2200, source_y = result_y          # crop
source_x = result_x / 0.375000, source_y = result_y / 0.375000   # overview

Skills

Skills are maintained in a separate repo: ZeroZ-lab/vistools-skills.

# Claude Code — install from the skills-only repo
/plugin install https://github.com/ZeroZ-lab/vistools-skills

# Then use: /vistools screenshot.png

Supports Claude Code, Cursor, and Codex.

Building

cargo build                       # debug
cargo build --release             # release (~5 MB, LTO + stripped)
cargo test                        # unit + integration
cargo clippy --all-targets -- -D warnings
cargo fmt --check

Supported input formats: PNG, JPEG, WebP, TIFF, BMP, GIF. Output format is inferred from the output file's extension.

Project layout

vistools/
├── crates/
│   ├── core/            # library: types, guard, coord, one module per command
│   └── cli/             # thin clap wrapper + integration tests
├── fixtures/            # unit-test images (64x64, 256x256, 1000x1000)
│   └── e2e/             # real-world test images
└── docs/                # design decisions (project.md), timeline, contracts

License

MIT / Apache-2.0, at your option.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
crates		crates
docs		docs
fixtures		fixtures
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
README.zh-CN.md		README.zh-CN.md
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vistools

Before vs After

Why

Install

Claude Code Plugin (recommended)

From source (Rust 1.88+)

Commands

`inspect` — metadata + strategy hint

`overview` — scaled-down preview

`tile` — grid split

`viewport` — crop a region

`sample` — point and region color picker

`diff` — compare two images

Photography metrics

Help & version

JSON output

Error codes

Typical agent workflow

Skills

Building

Project layout

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vistools

Before vs After

Why

Install

Claude Code Plugin (recommended)

From source (Rust 1.88+)

Commands

inspect — metadata + strategy hint

overview — scaled-down preview

tile — grid split

viewport — crop a region

sample — point and region color picker

diff — compare two images

Photography metrics

Help & version

JSON output

Error codes

Typical agent workflow

Skills

Building

Project layout

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`inspect` — metadata + strategy hint

`overview` — scaled-down preview

`tile` — grid split

`viewport` — crop a region

`sample` — point and region color picker

`diff` — compare two images

Packages