Skip to content

UTF-8 toolchain #11

@CyanoHao

Description

@CyanoHao

General discussion for UTF-8 toolchain.

UTF-8 support in MinGW Lite:

  • UTF-8 manifest (general introduction, the manifest in that page is INCORRECT; correct manifest): injected to all executables for branch 15, 14, 13. On Windows 10 1903+, UTF-8 manifest make Windows API to interpret char * parameters as UTF-8 encoded strings. Thus the toolchain gets Unicode support on Windows 10 1903+.
  • UTF-8 thunk: applied to all executables for branch 16+. Since char * strings come from and go to API boundary, by replacing -A APIs (more accurate, __imp_ symbols in import library) with thunks that do UTF-8 <-> UTF-16 conversion and call -W APIs (example), the toolchain gets full Unicode support on NT family, and best effort on 9x.

Beyond MinGW Lite:

  • UTF-8 thunk can be applied to not only the toolchain executables, but also target import libraries and shared libraries delivered to user.
  • Windows provide too many APIs. It is not practical to provide thunks for all of them. But what if we limit thunks to CRT and STL? That is the “u8crt” profiles in MinGW Lite’s build system, planned for MinGW ∞ project.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions