Skip to content

Fix content resizing "Shorten" action for CJK languages#715

Closed
i-anubhav-anand wants to merge 2 commits into
WordPress:developfrom
i-anubhav-anand:fix/content-resizing-cjk-word-count
Closed

Fix content resizing "Shorten" action for CJK languages#715
i-anubhav-anand wants to merge 2 commits into
WordPress:developfrom
i-anubhav-anand:fix/content-resizing-cjk-word-count

Conversation

@i-anubhav-anand

@i-anubhav-anand i-anubhav-anand commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Problem

Fixes #578

Japanese, Chinese, and Korean text doesn't use spaces as word separators. @wordpress/wordcount's count( text, 'words', {} ) returns near-zero for CJK content, so the "Shorten" action always showed the error "Text is too short to shorten further." even for long paragraphs.

Solution

Detect CJK content via a Unicode range regex and switch to 'characters_excluding_spaces' counting with a character-based minimum threshold (SHORTEN_MIN_CHARS = 10). The same locale-aware counting is applied to the word-diff display so the +/− indicator remains meaningful for CJK text.

Non-CJK content is unaffected — the existing SHORTEN_MIN_WORDS = 5 path runs as before.

Changes

  • src/experiments/content-resizing/components/ContentResizingToolbar.tsx
    • Add CJK_REGEX and SHORTEN_MIN_CHARS constants
    • handleAction('shorten'): use characters_excluding_spaces count for CJK content
    • wordDiff memo: use locale-aware count for accurate +/− display

Testing

  1. Create a post with Japanese/Chinese/Korean paragraph text (e.g. これはテストです。日本語のコンテンツをテストしています。)
  2. Select a text block and open the AI resize menu
  3. Click Shorten — it should proceed without the "too short" error
  4. Verify the word-diff badge shows a reasonable character delta
  5. With English text, verify the existing behaviour is unchanged
Open WordPress Playground Preview

i-anubhav-anand and others added 2 commits June 12, 2026 17:10
For languages like Japanese, Chinese, and Korean that don't use spaces
as word separators, `count( text, 'words', {} )` returns a very small
number (often 0 or 1), causing the "Text is too short to shorten
further." error even for long paragraphs.

Detect CJK content and use `characters_excluding_spaces` count with a
character-based minimum threshold instead. Apply the same locale-aware
counting in the word-diff display so the +/- indicator remains
meaningful for CJK text.
The regex literal contained a raw ideographic space (U+3000) as the
start of its first character range, which ESLint's
no-irregular-whitespace rule rejects. Escaped ranges are equivalent
and easier to review: \u3000-\u9FFF, \uAC00-\uD7FF, \uFF01-\uFF60.
@i-anubhav-anand i-anubhav-anand marked this pull request as ready for review June 12, 2026 20:02
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: i-anubhav-anand <anubhav24@git.wordpress.org>
Co-authored-by: dkotter <dkotter@git.wordpress.org>
Co-authored-by: t-hamano <wildworks@git.wordpress.org>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@dkotter

dkotter commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

@i-anubhav-anand Thanks for the PR but we do already have an open PR that resolves this same thing (see #581). I'd suggest reviewing that PR and if you have comments or concerns with the approach, best to leave those there instead of opening this duplicate PR.

@i-anubhav-anand

Copy link
Copy Markdown
Contributor Author

Thanks @dkotter — you're right, this overlaps the existing PR you linked. Closing in favor of it to keep the work consolidated; happy to help review or iterate there instead. Apologies for the duplicated effort!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Content Resizing: The "shorten" action does not detect the character length of Japanese text

2 participants