Use Content-Type charset when decoding download filenames (#3135)#4184
Open
linesight wants to merge 4 commits into
Open
Use Content-Type charset when decoding download filenames (#3135)#4184linesight wants to merge 4 commits into
linesight wants to merge 4 commits into
Conversation
…ame decoding (fixes issue chromiumembedded#3135) On macOS and Linux, when a server sends a non-UTF-8 Chinese filename in Content-Disposition (e.g., GBK-encoded) without an RFC 5987 filename* parameter, Chromium's DecodeWord() cannot decode it because it has no charset hint and falls back to SysNativeMBToWide(). On macOS this assumes UTF-8 input, producing garbled output. Fix by extracting the charset from the HTTP Content-Type response header (e.g., "text/html; charset=gb2312") and passing it as referrer_charset to net::GenerateFileName(), which uses it via CharsetConverter to properly decode the filename.
- Remove debug LOG(ERROR) from production code - Rename test constants for clarity (kTestChineseDomain etc.) - Add destroyed_ guard, request_context_ member, thread assertions - Add content verification and factory unregister in test cleanup - Remove GBK test case that cannot work through CefString API (CefString assumes UTF-8 and corrupts raw non-UTF-8 header bytes)
…overage - ChineseFilenameRawUTF8: IsStringUTF8() is platform-independent; valid UTF-8 bytes are auto-detected regardless of system locale - ChineseFilenameContentTypeCharset: clarify this exercises the charset extraction path but would pass even without the fix (UTF-8 auto-detection) - Document that a true GBK regression guard is blocked by CefString API
magreenblatt
requested changes
Jun 1, 2026
Collaborator
There was a problem hiding this comment.
Thanks for the fix!
- ChineseFilenameContentTypeCharset: clarify this exercises the charset
extraction path but would pass even without the fix (UTF-8 auto-detection)
As you mention, the test coverage in this PR doesn't actually cover the fix. The easiest option is to extend the existing CefTestServer API with a new method that takes all header data a single void* buffer.
A new method on CefTestServerConnection:
virtual void SendHttpResponseWithRawHeaders(const void* header_data,
size_t header_data_size,
const void* response_data,
size_t response_data_size) = 0;
header_data is the complete raw HTTP response header block including the status line, e.g.:
HTTP/1.1 200 OK\r\n
Content-Type: application/vnd.ms-excel; charset=gb2312\r\n
Content-Disposition: attachment; filename=<raw GBK bytes>\r\n
response_data is the response body. Both are void* + size_t pairs, which the CEF translator already handles (same pattern as
the existing data/data_size parameters on SendHttp200Response and SendHttpResponse).
The implementation in CefTestServerConnectionImpl uses the existing RawHttpResponse class from
net/test/embedded_test_server/http_response.h, which takes a headers string and contents string and sends them directly to
the HttpResponseDelegate via SendRawResponseHeaders() — no CefString conversion anywhere. It follows the same thread-bouncing
pattern as the existing SendBasicHttpResponse private method.
This enables the GBK download filename test: the test constructs a std::string containing raw GBK bytes in the
Content-Disposition header and charset=gb2312 in the Content-Type header, passes it as header_data. The bytes reach
Chromium's network stack intact, GetCharset() returns gb2312, the fix passes it to net::GenerateFileName(), and DecodeWord
decodes the GBK filename correctly.
Pass a real charset to net::GenerateFileName() instead of an empty string so filenames in legacy encodings like GBK decode properly. Use the charset from the Content-Type header when the server sends one, and otherwise fall back to the profile's default charset (intl.charset_default), same as Chrome's own download delegate does. My original test didn't really test the fix: it sent UTF-8 bytes that get auto-detected anyway, so it passed with or without the change. Replaced it with one that sends raw GBK bytes through a real CefTestServer. That needed a new SendHttpResponseWithRawHeaders() on CefTestServerConnection, because the existing CefString-based header API mangles non-UTF-8 bytes before they reach the wire. Without the fix the filename comes through as "order.xls"; with it you get the expected Chinese name. Ran it 100x to make sure it isn't flaky.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #3135
When a server's
Content-Dispositioncarries non-ASCII filename bytes without an RFC 5987 filename*,HttpContentDisposition::DecodeWordneeds a charset hint. CEF currently passes an empty string, soDecodeWordfalls back throughIsStringUTF8()toSysNativeMBToWide()— the platform locale. On macOS and Linux that produces garbled filenames whenever the server's encoding doesn't match the user's locale, most commonly Chinese sites that emit GBK filenames withContent-Type: ...; charset=gb2312.The fix pulls the charset off the in-flight response headers and passes it as
referrer_charset.When no charset is declared,
GetCharsetis a no-op and behavior is identical to before.