[codex] Bound screenshot payloads#17
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces size-bounded screenshot payloads by default, exposing opt-in sizing controls, coordinate metadata, and JPEG compression support to prevent large payloads from exceeding host limits. Review feedback highlights two performance improvement opportunities: optimizing encode_screenshot_to_fit_bytes to avoid cloning the uncompressed DynamicImage buffer and using a faster resizing filter, and refactoring the screenshot pipeline to eliminate an inefficient double decode/encode cycle when cropping and resizing targeted windows.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| let bytes = if options.format == ScreenshotOutputFormat::Png | ||
| && target_width == original_width | ||
| && target_height == original_height | ||
| { | ||
| raw.to_vec() | ||
| } else { | ||
| let output = if target_width == original_width && target_height == original_height { | ||
| img.clone() | ||
| } else { | ||
| img.resize_exact(target_width, target_height, FilterType::Lanczos3) | ||
| }; | ||
| encode_image(&output, options)? | ||
| }; |
There was a problem hiding this comment.
In encode_screenshot_to_fit_bytes, when options.format is ScreenshotOutputFormat::Jpeg or when the PNG bytes exceed max_bytes on the first iteration, img.clone() is called to clone the entire DynamicImage. Since DynamicImage holds the uncompressed pixel buffer (which can be tens of megabytes for high-resolution screenshots), cloning it is highly inefficient and causes unnecessary memory allocations and CPU overhead.
We can avoid cloning img entirely by passing a reference &img directly to encode_image when no resizing is needed, and only calling resize_exact when the dimensions actually change. Additionally, using FilterType::CatmullRom (bicubic) instead of FilterType::Lanczos3 for downscaling provides a massive speedup with virtually indistinguishable quality for screenshots.
let bytes = if options.format == ScreenshotOutputFormat::Png
&& target_width == original_width
&& target_height == original_height
{
raw.to_vec()
} else if target_width == original_width && target_height == original_height {
encode_image(&img, options)?
} else {
let resized = img.resize_exact(target_width, target_height, FilterType::CatmullRom);
encode_image(&resized, options)?
};| let (capture, cropped) = match crop.as_ref().and_then(window_crop_rect) { | ||
| Some((x, y, w, h)) => match crop_png(&raw_capture.bytes, x, y, w, h) { | ||
| Ok((bytes, cw, ch)) => ( | ||
| RawScreenshotCapture { | ||
| mime_type: raw_capture.mime_type.clone(), | ||
| bytes, | ||
| source: raw_capture.source.clone(), | ||
| width: cw, | ||
| height: ch, | ||
| }, | ||
| true, | ||
| ), | ||
| // If cropping fails, fall back to the full frame rather than erroring. | ||
| Err(_) => (raw, capture.width, capture.height, false), | ||
| Err(_) => (raw_capture, false), | ||
| }, | ||
| None => (raw, capture.width, capture.height, false), | ||
| None => (raw_capture, false), | ||
| }; | ||
| let capture = | ||
| prepare_screenshot_payload(capture, params.screenshot_options()).map_err(|e| { | ||
| ErrorData::internal_error(format!("screenshot resize failed: {e}"), None) | ||
| })?; |
There was a problem hiding this comment.
When a window target is cropped, there is an inefficient double decode/encode cycle:
crop_pngdecodes the raw PNG bytes into aDynamicImage, crops it, and encodes it back to PNG bytes.prepare_screenshot_payloadis then called with this cropped PNG.- Inside
prepare_screenshot_payload,encode_screenshot_to_fit_bytesis called, which decodes the cropped PNG bytes again into aDynamicImageto perform resizing/encoding.
This double decode/encode cycle is extremely CPU and memory intensive, especially for large screenshots. Consider refactoring the pipeline so that the image is decoded only once. For example, crop_png could return the cropped DynamicImage directly (or we could have a unified function that handles both cropping and resizing/encoding on the decoded image), and only encode it once at the very end of the pipeline.
Summary
Closes #16.
Validation