Skip to content

[codex] Bound screenshot payloads#17

Merged
avifenesh merged 1 commit into
mainfrom
fix/screenshot-payload-controls
Jun 5, 2026
Merged

[codex] Bound screenshot payloads#17
avifenesh merged 1 commit into
mainfrom
fix/screenshot-payload-controls

Conversation

@avifenesh
Copy link
Copy Markdown
Collaborator

Summary

  • bound screenshot payloads by default before returning image content to MCP hosts
  • keep raw capture dimensions for coordinate-sensitive paths, then crop before resizing/compression
  • add opt-in screenshot controls: max_width, max_height, max_bytes, scale, format=jpeg, and quality
  • expose returned-vs-coordinate dimensions, scale, byte counts, format, and quality in screenshot metadata

Closes #16.

Validation

  • cargo fmt --all -- --check
  • cargo test --locked
  • cargo clippy --locked --all-targets -- -D warnings
  • cargo build --locked
  • scripts/mcp_safety_check.py --binary target/debug/computer-use-linux
  • node scripts/zod-check/check.mjs --command target/debug/computer-use-linux
  • agnix .
  • git diff --check
  • live default CLI screenshot
  • live MCP screenshot call with format=jpeg, quality=55

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces size-bounded screenshot payloads by default, exposing opt-in sizing controls, coordinate metadata, and JPEG compression support to prevent large payloads from exceeding host limits. Review feedback highlights two performance improvement opportunities: optimizing encode_screenshot_to_fit_bytes to avoid cloning the uncompressed DynamicImage buffer and using a faster resizing filter, and refactoring the screenshot pipeline to eliminate an inefficient double decode/encode cycle when cropping and resizing targeted windows.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/screenshot.rs
Comment on lines +404 to +416
let bytes = if options.format == ScreenshotOutputFormat::Png
&& target_width == original_width
&& target_height == original_height
{
raw.to_vec()
} else {
let output = if target_width == original_width && target_height == original_height {
img.clone()
} else {
img.resize_exact(target_width, target_height, FilterType::Lanczos3)
};
encode_image(&output, options)?
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In encode_screenshot_to_fit_bytes, when options.format is ScreenshotOutputFormat::Jpeg or when the PNG bytes exceed max_bytes on the first iteration, img.clone() is called to clone the entire DynamicImage. Since DynamicImage holds the uncompressed pixel buffer (which can be tens of megabytes for high-resolution screenshots), cloning it is highly inefficient and causes unnecessary memory allocations and CPU overhead.

We can avoid cloning img entirely by passing a reference &img directly to encode_image when no resizing is needed, and only calling resize_exact when the dimensions actually change. Additionally, using FilterType::CatmullRom (bicubic) instead of FilterType::Lanczos3 for downscaling provides a massive speedup with virtually indistinguishable quality for screenshots.

        let bytes = if options.format == ScreenshotOutputFormat::Png
            && target_width == original_width
            && target_height == original_height
        {
            raw.to_vec()
        } else if target_width == original_width && target_height == original_height {
            encode_image(&img, options)?
        } else {
            let resized = img.resize_exact(target_width, target_height, FilterType::CatmullRom);
            encode_image(&resized, options)?
        };

Comment thread src/server.rs
Comment on lines +359 to +379
let (capture, cropped) = match crop.as_ref().and_then(window_crop_rect) {
Some((x, y, w, h)) => match crop_png(&raw_capture.bytes, x, y, w, h) {
Ok((bytes, cw, ch)) => (
RawScreenshotCapture {
mime_type: raw_capture.mime_type.clone(),
bytes,
source: raw_capture.source.clone(),
width: cw,
height: ch,
},
true,
),
// If cropping fails, fall back to the full frame rather than erroring.
Err(_) => (raw, capture.width, capture.height, false),
Err(_) => (raw_capture, false),
},
None => (raw, capture.width, capture.height, false),
None => (raw_capture, false),
};
let capture =
prepare_screenshot_payload(capture, params.screenshot_options()).map_err(|e| {
ErrorData::internal_error(format!("screenshot resize failed: {e}"), None)
})?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When a window target is cropped, there is an inefficient double decode/encode cycle:

  1. crop_png decodes the raw PNG bytes into a DynamicImage, crops it, and encodes it back to PNG bytes.
  2. prepare_screenshot_payload is then called with this cropped PNG.
  3. Inside prepare_screenshot_payload, encode_screenshot_to_fit_bytes is called, which decodes the cropped PNG bytes again into a DynamicImage to perform resizing/encoding.

This double decode/encode cycle is extremely CPU and memory intensive, especially for large screenshots. Consider refactoring the pipeline so that the image is decoded only once. For example, crop_png could return the cropped DynamicImage directly (or we could have a unified function that handles both cropping and resizing/encoding on the decoded image), and only encode it once at the very end of the pipeline.

@avifenesh avifenesh marked this pull request as ready for review June 5, 2026 15:45
@avifenesh avifenesh merged commit d8fb08d into main Jun 5, 2026
20 checks passed
@avifenesh avifenesh deleted the fix/screenshot-payload-controls branch June 5, 2026 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Screenshot tool can return images that permanently destroy a session

1 participant