Fix content resizing "Shorten" action for CJK languages#715
Fix content resizing "Shorten" action for CJK languages#715i-anubhav-anand wants to merge 2 commits into
Conversation
For languages like Japanese, Chinese, and Korean that don't use spaces
as word separators, `count( text, 'words', {} )` returns a very small
number (often 0 or 1), causing the "Text is too short to shorten
further." error even for long paragraphs.
Detect CJK content and use `characters_excluding_spaces` count with a
character-based minimum threshold instead. Apply the same locale-aware
counting in the word-diff display so the +/- indicator remains
meaningful for CJK text.
The regex literal contained a raw ideographic space (U+3000) as the start of its first character range, which ESLint's no-irregular-whitespace rule rejects. Escaped ranges are equivalent and easier to review: \u3000-\u9FFF, \uAC00-\uD7FF, \uFF01-\uFF60.
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
|
@i-anubhav-anand Thanks for the PR but we do already have an open PR that resolves this same thing (see #581). I'd suggest reviewing that PR and if you have comments or concerns with the approach, best to leave those there instead of opening this duplicate PR. |
|
Thanks @dkotter — you're right, this overlaps the existing PR you linked. Closing in favor of it to keep the work consolidated; happy to help review or iterate there instead. Apologies for the duplicated effort! |
Problem
Fixes #578
Japanese, Chinese, and Korean text doesn't use spaces as word separators.
@wordpress/wordcount'scount( text, 'words', {} )returns near-zero for CJK content, so the "Shorten" action always showed the error "Text is too short to shorten further." even for long paragraphs.Solution
Detect CJK content via a Unicode range regex and switch to
'characters_excluding_spaces'counting with a character-based minimum threshold (SHORTEN_MIN_CHARS = 10). The same locale-aware counting is applied to the word-diff display so the +/− indicator remains meaningful for CJK text.Non-CJK content is unaffected — the existing
SHORTEN_MIN_WORDS = 5path runs as before.Changes
src/experiments/content-resizing/components/ContentResizingToolbar.tsxCJK_REGEXandSHORTEN_MIN_CHARSconstantshandleAction('shorten'): usecharacters_excluding_spacescount for CJK contentwordDiffmemo: use locale-aware count for accurate +/− displayTesting
これはテストです。日本語のコンテンツをテストしています。)