fix(provider): 修复 base64:// 图片引用的 MIME 类型声明不准确问题#8177
Conversation
- 新增 `_detect_image_format` 方法,使用 Pillow verify() 检测图片真实格式,避免完整解码像素带来的额外开销 - 新增 `_base64_image_ref_to_data_url` 方法,将 base64:// 引用转换为携带真实 MIME 类型的 data URL,修复 PNG/GIF/WebP 等图片被错误声明为 image/jpeg 的问题 - 提取 `_IMAGE_FORMAT_MIME_TYPES` 类常量和 `_image_format_to_mime_type` 方法,统一本地文件与 base64:// 引用的格式映射逻辑,新增 TIFF/AVIF 格式支持 - 新增单元测试 `test_resolve_image_part_preserves_base64_png_mime_type`,覆盖 PNG 图片 MIME 类型正确声明的场景 Closes AstrBotDevs#8174
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
_base64_image_ref_to_data_url, you catchbinascii.Errorfrombase64.b64decodebutbinasciiis not defined in this file; consider either importingbinasciiexplicitly or catching the exception viabase64.binascii.Error(or a broaderException) to avoid aNameErrorat runtime.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `_base64_image_ref_to_data_url`, you catch `binascii.Error` from `base64.b64decode` but `binascii` is not defined in this file; consider either importing `binascii` explicitly or catching the exception via `base64.binascii.Error` (or a broader `Exception`) to avoid a `NameError` at runtime.
## Individual Comments
### Comment 1
<location path="tests/test_openai_source.py" line_range="1045-1047" />
<code_context>
+ image_buffer,
+ format="PNG",
+ )
+ image_base64 = base64.b64encode(image_buffer.getvalue()).decode("ascii")
+
+ image_part = await provider._resolve_image_part(f"base64://{image_base64}")
+
+ assert image_part == {
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for invalid or malformed base64 inputs to ensure legacy JPEG behavior is preserved
Since `_base64_image_ref_to_data_url` is intended to preserve the existing behavior for invalid/malformed base64 (or non-image) input by still returning a JPEG data URL, please add a test where `image_ref` is `base64://` plus an invalid base64 string (e.g. `"not-base64"`) and assert that `_resolve_image_part` still returns a `data:image/jpeg;base64,...` URL rather than raising. This will explicitly cover the backward-compatibility case described in the implementation comments.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| image_base64 = base64.b64encode(image_buffer.getvalue()).decode("ascii") | ||
|
|
||
| image_part = await provider._resolve_image_part(f"base64://{image_base64}") |
There was a problem hiding this comment.
suggestion (testing): Add a test for invalid or malformed base64 inputs to ensure legacy JPEG behavior is preserved
Since _base64_image_ref_to_data_url is intended to preserve the existing behavior for invalid/malformed base64 (or non-image) input by still returning a JPEG data URL, please add a test where image_ref is base64:// plus an invalid base64 string (e.g. "not-base64") and assert that _resolve_image_part still returns a data:image/jpeg;base64,... URL rather than raising. This will explicitly cover the backward-compatibility case described in the implementation comments.
There was a problem hiding this comment.
Code Review
This pull request implements dynamic MIME type detection for image attachments within the OpenAI provider, replacing the previous hardcoded JPEG default for base64:// references. It introduces helper methods for format detection and MIME mapping, along with corresponding unit tests. Feedback suggested a minor performance optimization for large base64 strings during format detection.
| # 平台适配器可能通过 `base64://` 传入 PNG/GIF/WebP 等图片字节, | ||
| # 但不会额外携带 MIME 元数据。发送 OpenAI 请求前先识别真实格式, | ||
| # 避免把 PNG 等图片错误声明为 JPEG。 | ||
| image_bytes = base64.b64decode(raw_base64) |
There was a problem hiding this comment.
Decoding the entire base64 string into memory just to detect the image format can be inefficient for very large images. Since most image headers are within the first few dozen bytes, you could potentially optimize this by decoding only the beginning of the string. However, given that the full base64 is required for the final data URL and Pillow's verify() is relatively lightweight, this is a minor performance consideration. Additionally, as this introduces new functionality for handling attachments, please ensure it is accompanied by corresponding unit tests.
References
- New functionality, such as handling attachments, should be accompanied by corresponding unit tests.
_detect_image_format方法,使用 Pillow verify() 检测图片真实格式,避免完整解码像素带来的额外开销_base64_image_ref_to_data_url方法,将 base64:// 引用转换为携带真实 MIME 类型的 data URL,修复 PNG/GIF/WebP 等图片被错误声明为 image/jpeg 的问题_IMAGE_FORMAT_MIME_TYPES类常量和_image_format_to_mime_type方法,统一本地文件与 base64:// 引用的格式映射逻辑,新增 TIFF/AVIF 格式支持test_resolve_image_part_preserves_base64_png_mime_type,覆盖 PNG 图片 MIME 类型正确声明的场景Closes #8174
Modifications / 改动点
Screenshots or Test Results / 运行截图或测试结果
Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Ensure base64:// image references are converted to data URLs with correctly detected MIME types and extend image format support.
Bug Fixes:
Enhancements:
Tests: