Use Content-Type charset when decoding download filenames (#3135) by linesight · Pull Request #4184 · chromiumembedded/cef

linesight · 2026-05-31T23:54:00Z

When a server's Content-Disposition carries non-ASCII filename bytes without an RFC 5987 filename*, HttpContentDisposition::DecodeWord needs a charset hint. CEF currently passes an empty string, so DecodeWord falls back through IsStringUTF8() to SysNativeMBToWide() — the platform locale. On macOS and Linux that produces garbled filenames whenever the server's encoding doesn't match the user's locale, most commonly Chinese sites that emit GBK filenames with Content-Type: ...; charset=gb2312.

The fix pulls the charset off the in-flight response headers and passes it as referrer_charset.

When no charset is declared, GetCharset is a no-op and behavior is identical to before.

…ame decoding (fixes issue chromiumembedded#3135) On macOS and Linux, when a server sends a non-UTF-8 Chinese filename in Content-Disposition (e.g., GBK-encoded) without an RFC 5987 filename* parameter, Chromium's DecodeWord() cannot decode it because it has no charset hint and falls back to SysNativeMBToWide(). On macOS this assumes UTF-8 input, producing garbled output. Fix by extracting the charset from the HTTP Content-Type response header (e.g., "text/html; charset=gb2312") and passing it as referrer_charset to net::GenerateFileName(), which uses it via CharsetConverter to properly decode the filename.

- Remove debug LOG(ERROR) from production code - Rename test constants for clarity (kTestChineseDomain etc.) - Add destroyed_ guard, request_context_ member, thread assertions - Add content verification and factory unregister in test cleanup - Remove GBK test case that cannot work through CefString API (CefString assumes UTF-8 and corrupts raw non-UTF-8 header bytes)

…overage - ChineseFilenameRawUTF8: IsStringUTF8() is platform-independent; valid UTF-8 bytes are auto-detected regardless of system locale - ChineseFilenameContentTypeCharset: clarify this exercises the charset extraction path but would pass even without the fix (UTF-8 auto-detection) - Document that a true GBK regression guard is blocked by CefString API

magreenblatt

Thanks for the fix!

ChineseFilenameContentTypeCharset: clarify this exercises the charset
extraction path but would pass even without the fix (UTF-8 auto-detection)

As you mention, the test coverage in this PR doesn't actually cover the fix. The easiest option is to extend the existing CefTestServer API with a new method that takes all header data a single void* buffer.

  A new method on CefTestServerConnection:
  
  virtual void SendHttpResponseWithRawHeaders(const void* header_data,
                                              size_t header_data_size,
                                              const void* response_data,
                                              size_t response_data_size) = 0;

  header_data is the complete raw HTTP response header block including the status line, e.g.:

  HTTP/1.1 200 OK\r\n
  Content-Type: application/vnd.ms-excel; charset=gb2312\r\n
  Content-Disposition: attachment; filename=<raw GBK bytes>\r\n

  response_data is the response body. Both are void* + size_t pairs, which the CEF translator already handles (same pattern as
  the existing data/data_size parameters on SendHttp200Response and SendHttpResponse).

  The implementation in CefTestServerConnectionImpl uses the existing RawHttpResponse class from
  net/test/embedded_test_server/http_response.h, which takes a headers string and contents string and sends them directly to
  the HttpResponseDelegate via SendRawResponseHeaders() — no CefString conversion anywhere. It follows the same thread-bouncing
  pattern as the existing SendBasicHttpResponse private method.

  This enables the GBK download filename test: the test constructs a std::string containing raw GBK bytes in the
  Content-Disposition header and charset=gb2312 in the Content-Type header, passes it as header_data. The bytes reach
  Chromium's network stack intact, GetCharset() returns gb2312, the fix passes it to net::GenerateFileName(), and DecodeWord
  decodes the GBK filename correctly.

Pass a real charset to net::GenerateFileName() instead of an empty string so filenames in legacy encodings like GBK decode properly. Use the charset from the Content-Type header when the server sends one, and otherwise fall back to the profile's default charset (intl.charset_default), same as Chrome's own download delegate does. My original test didn't really test the fix: it sent UTF-8 bytes that get auto-detected anyway, so it passed with or without the change. Replaced it with one that sends raw GBK bytes through a real CefTestServer. That needed a new SendHttpResponseWithRawHeaders() on CefTestServerConnection, because the existing CefString-based header API mangles non-UTF-8 bytes before they reach the wire. Without the fix the filename comes through as "order.xls"; with it you get the expected Chinese name. Ran it 100x to make sure it isn't flaky.

linesight added 3 commits May 31, 2026 08:46

magreenblatt requested changes Jun 1, 2026

View reviewed changes

Comment thread libcef/browser/download_manager_delegate_impl.cc

linesight requested a review from magreenblatt June 5, 2026 02:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use Content-Type charset when decoding download filenames (#3135)#4184

Use Content-Type charset when decoding download filenames (#3135)#4184
linesight wants to merge 4 commits into
chromiumembedded:masterfrom
linesight:fix-3135-chinese-filename-charset

linesight commented May 31, 2026 •

edited

Loading

Uh oh!

magreenblatt left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

linesight commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

magreenblatt left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

linesight commented May 31, 2026 •

edited

Loading

magreenblatt left a comment •

edited

Loading