perf(transliterate): fuse the compose pre-scan into the engine pass#497
Conversation
Recovers most of the form-invariance (#475/#477/#481) transliterate regression. The compose-at-lookup boundary added a `needs_composition(text)` pre-scan that walked the *entire* string a second time (UTF-8 decode + `is_combining_mark` trie lookup per char) before the actual transliteration — so every non-ASCII call paid two full passes where it used to pay one. On the perf-results bucket the latin/unidecode speedup ratio fell ~18 -> ~11 (PR #474 -> #480), all comparators dropping proportionally (it's disarm, not the comparators). This is exactly the 2026-06-20 review's M-3 finding. Fix: detect composition *inside* the engine's existing decode loop instead of in a separate pass. `transliterate_run` now takes `detect_compose` and returns `Option`; on the fast (borrowed-input) attempt it bails the instant it decodes a combining mark or conjoining jamo — off the scalar it already decoded — and the caller redoes the work over the composed buffer (with `detect_compose = false` so a leftover lone mark there doesn't re-trigger). Mark-free input — the common case — now completes in one pass with no second scan. Also replaces the `is_combining_mark` trie lookup in `needs_composition` (still used by the confusables path) with a `could_compose` range fast-path over U+0000–058F, where marks live only in two small sub-blocks. Behaviour is identical (the bail triggers on exactly the chars `needs_composition` did); verified by the full form-invariance suite, the 16 gated exhaustive tests (all Hangul / BMP / CJK / Indic), tier-1 (21 binaries), formal invariants, and 949 Python transliterate/confusables/form-invariance tests. Rust-level micro-bench (no PyO3), pre -> post: latin 6.7 -> 4.5 (-33%), cyrillic 13.7 -> 11.0 (-20%), greek 12.9 -> 11.4 (-12%), mixed 10.0 -> 7.9 (-21%) ns/char. Assisted-by: Claude Code:claude-opus-4-8 Signed-off-by: Richard Quinn <quinn.richard@gmail.com>
Assisted-by: Claude Code:claude-opus-4-8 Signed-off-by: Richard Quinn <quinn.richard@gmail.com>
There was a problem hiding this comment.
Pull request overview
This PR improves transliterate hot-path performance by eliminating the extra needs_composition(text) pre-scan and instead detecting “compose-required” characters inside the transliteration engine’s existing UTF-8 decode loop, preserving the compose-at-lookup form-invariance behavior while recovering most of the regression introduced by the boundary compose work.
Changes:
- Fuse compose detection into the transliteration engine pass: attempt a borrowed-input run first and bail to a composed-buffer rerun only when a compose-triggering scalar is encountered.
- Thread a
detect_composeflag throughtransliterate_dispatch/transliterate_run, switching them to returnOption<Cow<str>>so the fast pass can signal “needs compose”. - Speed up
needs_compositionwith a newcould_composepredicate that adds a cheap range fast-path over U+0000–058F; document the perf recovery in the changelog.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/transliterate.rs |
Implements the fused fast-path attempt, adds detect_compose plumbing, and makes the engine bail to compose only when needed. |
src/compose.rs |
Adds could_compose and updates needs_composition to use it, reducing per-scalar overhead in common ranges. |
CHANGELOG.md |
Records the performance regression root cause and the fused-pass recovery. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…gnal Copilot (#497): say 'conjoining Hangul L jamo' (the precise `could_compose` predicate), not the broader 'conjoining jamo'; and document that `transliterate_dispatch` returns `None` only as the compose-bail signal (with detect_compose=true), never as 'no output'. Comment-only. Assisted-by: Claude Code:claude-opus-4-8 Signed-off-by: Richard Quinn <quinn.richard@gmail.com>
|
Review comments addressed in abdad76 (comment-only): the two fusion comments now read "combining mark or conjoining Hangul L jamo (the |
…try into 0.11.0 Two 0.11 release-prep touch-ups per RELEASING.md: - BINDINGS.md gains the per-binding<->core compatibility note (RELEASING.md 'Across languages'): all registries wrap core 0.11 at 0.11, with the per-registry-patch drift rule spelled out. Cheap insurance ahead of the npm binding's independent patch lane. - CHANGELOG: move the transliterate compose-fusion perf entry (#497) out of [Unreleased] and into [0.11.0], since 0.11.0 is not yet tagged — it ships in the release. Assisted-by: Claude Code:claude-opus-4-8 Signed-off-by: Richard Quinn <quinn.richard@gmail.com>
…g-fold docs(release): binding↔core compatibility note + fold #497 perf into 0.11.0
Recovers most of the form-invariance transliterate regression the perf-results branch flagged.
Root cause
The compose-at-lookup boundary (#475/#477/#481) added a
needs_composition(text)pre-scan that walked every non-ASCII input a second time (UTF-8 decode +is_combining_marktrie lookup per character) before the actual transliteration. So the hot path went from one pass to two. On the perf-results bucket (EPYC 7763 / CPython 3.12.13 / corpus803c316e) the latin/unidecode speedup ratio fell ~18.3 → ~11.4 across PR #474 → #480, all version-pinned comparators dropping proportionally (so it's disarm, not the comparators). This is the 2026-06-20 review's M-3 finding.Fix
Detect composition inside the engine's existing decode loop instead of in a separate pass:
transliterate_runtakesdetect_composeand returnsOption. On the fast (borrowed-input) attempt it bails the instant it decodes a combining mark / conjoining jamo — off the scalar it already decoded — and the caller redoes the work over the composed buffer (withdetect_compose = false, so a leftover lone mark there doesn't re-trigger). Mark-free input (the common case) now completes in one pass.needs_composition(still used by the confusables fold) also drops the trie lookup for acould_composerange fast-path over U+0000–058F, where marks live only in two small sub-blocks.Correctness
Behaviour is identical — the bail fires on exactly the chars
needs_compositiondid. Verified:≇/≅etc.Measurements (Rust-level micro-bench, no PyO3 dilution; pre → post)
The CI perf-gate will quantify the recovery on the canonical bucket via the comparator ratio (which cancels per-call overhead).
🤖 Generated with Claude Code