Fix content classification minimum content check for CJK languages by i-anubhav-anand · Pull Request #716 · WordPress/ai

i-anubhav-anand · 2026-06-12T11:43:09Z

Problem

Fixes #571

Japanese, Chinese, and Korean don't separate words with spaces. count( text, 'words' ) returns near-zero for CJK content, so hasEnoughContent was always false for CJK users — the classification panel was permanently disabled even with long articles.

Solution

Detect CJK content via a Unicode range regex and switch to 'characters_excluding_spaces' counting. The same MINIMUM_WORD_COUNT = 150 threshold is reused, which is meaningful in CJK context (≈ a short paragraph of ~150 characters).

Non-CJK content continues to use word-based counting as before.

Changes

src/experiments/content-classification/components/useContentClassification.ts
- Add CJK_REGEX constant
- hasEnoughContent: detect CJK and use character count instead of word count

Testing

Create a post with 150+ Japanese/Chinese/Korean characters
Open the Content Classification panel — the Generate button should be enabled
Verify English posts still require 150+ words before the button enables

Japanese, Chinese, and Korean don't separate words with spaces, so `count( text, 'words' )` returns near-zero for these languages even for long paragraphs. This makes `hasEnoughContent` always false, blocking classification for CJK users. Detect CJK content and switch to `characters_excluding_spaces` counting with the same 150-unit threshold, which is meaningful for CJK text (≈ a short paragraph).

The regex literal contained a raw ideographic space (U+3000) as the start of its first character range, which ESLint's no-irregular-whitespace rule rejects. Escaped ranges are equivalent and easier to review: \u3000-\u9FFF, \uAC00-\uD7FF, \uFF01-\uFF60.

github-actions · 2026-06-12T20:02:31Z

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: i-anubhav-anand <anubhav24@git.wordpress.org>
Co-authored-by: dkotter <dkotter@git.wordpress.org>
Co-authored-by: t-hamano <wildworks@git.wordpress.org>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

dkotter · 2026-06-12T20:10:20Z

@i-anubhav-anand This is another PR that duplicates effort in #581. Please review that PR and let's try and keep work there to avoid duplication

i-anubhav-anand · 2026-06-15T14:20:56Z

Thanks @dkotter — you're right, this overlaps the existing PR you linked. Closing in favor of it to keep the work consolidated; happy to help review or iterate there instead. Apologies for the duplicated effort!

i-anubhav-anand and others added 2 commits June 12, 2026 17:11

i-anubhav-anand marked this pull request as ready for review June 12, 2026 20:02

dkotter assigned i-anubhav-anand Jun 12, 2026

i-anubhav-anand closed this Jun 15, 2026

dkotter mentioned this pull request Jun 15, 2026

Content Classification: Fix minimum-content threshold for CJK languages #728

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix content classification minimum content check for CJK languages#716

Fix content classification minimum content check for CJK languages#716
i-anubhav-anand wants to merge 2 commits into
WordPress:developfrom
i-anubhav-anand:fix/content-classification-cjk-word-count

i-anubhav-anand commented Jun 12, 2026 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

dkotter commented Jun 12, 2026

Uh oh!

i-anubhav-anand commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

i-anubhav-anand commented Jun 12, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Changes

Testing

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dkotter commented Jun 12, 2026

Uh oh!

i-anubhav-anand commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

i-anubhav-anand commented Jun 12, 2026 •

edited by github-actions Bot

Loading

github-actions Bot commented Jun 12, 2026 •

edited

Loading