Skip to content

refactor(width): delegate CJK classification to xterm wcwidth table#76

Open
KoalaHao wants to merge 1 commit into
Norbert515:mainfrom
marsup-space:refactor/unicode-width-xterm-wcwidth
Open

refactor(width): delegate CJK classification to xterm wcwidth table#76
KoalaHao wants to merge 1 commit into
Norbert515:mainfrom
marsup-space:refactor/unicode-width-xterm-wcwidth

Conversation

@KoalaHao
Copy link
Copy Markdown

@KoalaHao KoalaHao commented Jun 5, 2026

Replaces the hand-maintained range whitelist in _isWideCharacter with a single call to the Unicode 11 wcwidth table already vendored at lib/src/third_party/xterm_pure.dart/src/utils/unicode_v11.dart. Verified via probe that the table returns 2 for every range the whitelist covered (CJK ideographs, kana, hangul, fullwidth, CJK Symbols and Punctuation 《, 》, 「, 」, ...).

Three small layers stay manual:

  • Tab -> 1 column (table returns 0)
  • Zero-width characters (combining marks, ZWJ, variation selectors) -> 0, regardless of what the table says
  • General Punctuation 0x2010-0x205F -> 2 columns. The table returns 1 for these (East Asian Ambiguous), but CJK text renders em dash, smart quotes, ellipsis, primes, and reversed question/exclamation marks as full-width.
  • Emoji allowlist (regional indicators, certain Dingbats/Misc Symbols) bumps width-1 emoji to 2 when the table classifies them as Narrow/Neutral.

Removes the maintenance burden of hand-tracking Unicode ranges. A new CJK block added in a future Unicode revision is covered automatically, with no edits to this file.

Replaces the hand-maintained range whitelist in _isWideCharacter
with a single call to the Unicode 11 wcwidth table already
vendored at lib/src/third_party/xterm_pure.dart/src/utils/unicode_v11.dart.
Verified via probe that the table returns 2 for every range the
whitelist covered (CJK ideographs, kana, hangul, fullwidth, CJK
Symbols and Punctuation 《, 》, 「, 」, ...).

Three small layers stay manual:
* Tab -> 1 column (table returns 0)
* Zero-width characters (combining marks, ZWJ, variation
  selectors) -> 0, regardless of what the table says
* General Punctuation 0x2010-0x205F -> 2 columns. The table
  returns 1 for these (East Asian Ambiguous), but CJK text
  renders em dash, smart quotes, ellipsis, primes, and
  reversed question/exclamation marks as full-width.
* Emoji allowlist (regional indicators, certain Dingbats/Misc
  Symbols) bumps width-1 emoji to 2 when the table classifies
  them as Narrow/Neutral.

Removes the maintenance burden of hand-tracking Unicode ranges.
A new CJK block added in a future Unicode revision is covered
automatically, with no edits to this file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant