Skip to content

refactor: migrate to icu_normalizer for Unicode normalization#7

Merged
nnunley merged 1 commit into
forest-rs:mainfrom
nnunley:icu-normalizer
Apr 6, 2026
Merged

refactor: migrate to icu_normalizer for Unicode normalization#7
nnunley merged 1 commit into
forest-rs:mainfrom
nnunley:icu-normalizer

Conversation

@nnunley
Copy link
Copy Markdown
Collaborator

@nnunley nnunley commented Apr 6, 2026

Summary

  • Replace unicode-normalization crate with icu_normalizer (icu4x) for NFC/NFKC normalization, consolidating all Unicode operations under the icu4x family
  • Return Cow<str> from apply_canonical_form and apply_case_mapping to avoid allocation when text is already normalized
  • Fix no_std clippy pre-push hook to exclude example crates, matching CI workflow

Details

The project already uses icu_casemap for case mapping. This change drops the unicode-normalization dependency (and its transitive tinyvec) in favor of icu_normalizer, which shares the existing icu4x data tables.

The icu_normalizer::ComposingNormalizer API returns Cow<str>, enabling a zero-allocation fast path when text is already in NFC/NFKC form. Both helper functions now propagate Cow instead of unconditionally allocating String.

Test plan

  • All existing leit_text tests pass (10/10)
  • Full workspace cargo test --all-features passes
  • cargo clippy --workspace --all-features --all-targets clean
  • no_std clippy check passes
  • wasm32 clippy check passes

… icu_normalizer

Replace the unicode-normalization crate with icu_normalizer (icu4x) for
NFC/NFKC normalization. This consolidates all Unicode operations under the
icu4x family, which is already used for case mapping via icu_casemap.

The icu_normalizer API returns Cow<str>, allowing us to avoid allocation
when text is already in the target canonical form. Both apply_canonical_form
and apply_case_mapping now return Cow<str> instead of String.

Also fix the no_std clippy pre-push hook to exclude example crates,
matching the CI workflow configuration.
@nnunley nnunley merged commit ab12727 into forest-rs:main Apr 6, 2026
14 checks passed
@nnunley nnunley deleted the icu-normalizer branch April 6, 2026 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants