Skip to content

perf(transliterate): fuse the compose pre-scan into the engine pass#497

Merged
raeq merged 3 commits into
mainfrom
perf/fuse-compose-scan
Jun 21, 2026
Merged

perf(transliterate): fuse the compose pre-scan into the engine pass#497
raeq merged 3 commits into
mainfrom
perf/fuse-compose-scan

Conversation

@raeq

@raeq raeq commented Jun 21, 2026

Copy link
Copy Markdown
Owner

Recovers most of the form-invariance transliterate regression the perf-results branch flagged.

Root cause

The compose-at-lookup boundary (#475/#477/#481) added a needs_composition(text) pre-scan that walked every non-ASCII input a second time (UTF-8 decode + is_combining_mark trie lookup per character) before the actual transliteration. So the hot path went from one pass to two. On the perf-results bucket (EPYC 7763 / CPython 3.12.13 / corpus 803c316e) the latin/unidecode speedup ratio fell ~18.3 → ~11.4 across PR #474#480, all version-pinned comparators dropping proportionally (so it's disarm, not the comparators). This is the 2026-06-20 review's M-3 finding.

Fix

Detect composition inside the engine's existing decode loop instead of in a separate pass:

  • transliterate_run takes detect_compose and returns Option. On the fast (borrowed-input) attempt it bails the instant it decodes a combining mark / conjoining jamo — off the scalar it already decoded — and the caller redoes the work over the composed buffer (with detect_compose = false, so a leftover lone mark there doesn't re-trigger). Mark-free input (the common case) now completes in one pass.
  • needs_composition (still used by the confusables fold) also drops the trie lookup for a could_compose range fast-path over U+0000–058F, where marks live only in two small sub-blocks.

Correctness

Behaviour is identical — the bail fires on exactly the chars needs_composition did. Verified:

  • Gated exhaustive (tier 3): 16/16 — all 11,172 Hangul syllables, full BMP, all CJK, 15 Indic blocks
  • Formal invariants: 14, tier-1: 21 binaries, 949 Python transliterate/confusables/form-invariance/surrogate tests
  • Confirmed pre/post-fusion output identical on / etc.

Measurements (Rust-level micro-bench, no PyO3 dilution; pre → post)

corpus before after gain
latin 6.7 4.5 −33%
cyrillic 13.7 11.0 −20%
mixed 10.0 7.9 −21%
greek 12.9 11.4 −12%

The CI perf-gate will quantify the recovery on the canonical bucket via the comparator ratio (which cancels per-call overhead).

Separate finding (not this PR): the #492 adversarial oracle (hypothesis-marked, dev-tier, excluded from CI) flags that ml_normalize('≇') is non-idempotent (≇ → ≅ → "approximately equal"). Confirmed pre-existing on main (identical pre/post fusion) — a symbol-naming/NFKC fixed-point issue, orthogonal to this change.

🤖 Generated with Claude Code

raeq added 2 commits June 21, 2026 21:06
Recovers most of the form-invariance (#475/#477/#481) transliterate regression. The
compose-at-lookup boundary added a `needs_composition(text)` pre-scan that walked the
*entire* string a second time (UTF-8 decode + `is_combining_mark` trie lookup per char)
before the actual transliteration — so every non-ASCII call paid two full passes where
it used to pay one. On the perf-results bucket the latin/unidecode speedup ratio fell
~18 -> ~11 (PR #474 -> #480), all comparators dropping proportionally (it's disarm, not
the comparators). This is exactly the 2026-06-20 review's M-3 finding.

Fix: detect composition *inside* the engine's existing decode loop instead of in a
separate pass. `transliterate_run` now takes `detect_compose` and returns `Option`; on
the fast (borrowed-input) attempt it bails the instant it decodes a combining mark or
conjoining jamo — off the scalar it already decoded — and the caller redoes the work
over the composed buffer (with `detect_compose = false` so a leftover lone mark there
doesn't re-trigger). Mark-free input — the common case — now completes in one pass with
no second scan. Also replaces the `is_combining_mark` trie lookup in `needs_composition`
(still used by the confusables path) with a `could_compose` range fast-path over
U+0000–058F, where marks live only in two small sub-blocks.

Behaviour is identical (the bail triggers on exactly the chars `needs_composition` did);
verified by the full form-invariance suite, the 16 gated exhaustive tests (all Hangul /
BMP / CJK / Indic), tier-1 (21 binaries), formal invariants, and 949 Python
transliterate/confusables/form-invariance tests. Rust-level micro-bench (no PyO3),
pre -> post: latin 6.7 -> 4.5 (-33%), cyrillic 13.7 -> 11.0 (-20%), greek 12.9 -> 11.4
(-12%), mixed 10.0 -> 7.9 (-21%) ns/char.

Assisted-by: Claude Code:claude-opus-4-8
Signed-off-by: Richard Quinn <quinn.richard@gmail.com>
Assisted-by: Claude Code:claude-opus-4-8
Signed-off-by: Richard Quinn <quinn.richard@gmail.com>
Copilot AI review requested due to automatic review settings June 21, 2026 19:12
@raeq raeq enabled auto-merge (squash) June 21, 2026 19:12

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves transliterate hot-path performance by eliminating the extra needs_composition(text) pre-scan and instead detecting “compose-required” characters inside the transliteration engine’s existing UTF-8 decode loop, preserving the compose-at-lookup form-invariance behavior while recovering most of the regression introduced by the boundary compose work.

Changes:

  • Fuse compose detection into the transliteration engine pass: attempt a borrowed-input run first and bail to a composed-buffer rerun only when a compose-triggering scalar is encountered.
  • Thread a detect_compose flag through transliterate_dispatch/transliterate_run, switching them to return Option<Cow<str>> so the fast pass can signal “needs compose”.
  • Speed up needs_composition with a new could_compose predicate that adds a cheap range fast-path over U+0000–058F; document the perf recovery in the changelog.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/transliterate.rs Implements the fused fast-path attempt, adds detect_compose plumbing, and makes the engine bail to compose only when needed.
src/compose.rs Adds could_compose and updates needs_composition to use it, reducing per-scalar overhead in common ranges.
CHANGELOG.md Records the performance regression root cause and the fused-pass recovery.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/transliterate.rs
Comment thread src/transliterate.rs
Comment thread src/transliterate.rs
github-actions Bot added a commit that referenced this pull request Jun 21, 2026
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…gnal

Copilot (#497): say 'conjoining Hangul L jamo' (the precise `could_compose` predicate),
not the broader 'conjoining jamo'; and document that `transliterate_dispatch` returns
`None` only as the compose-bail signal (with detect_compose=true), never as 'no output'.
Comment-only.

Assisted-by: Claude Code:claude-opus-4-8
Signed-off-by: Richard Quinn <quinn.richard@gmail.com>
@raeq

raeq commented Jun 21, 2026

Copy link
Copy Markdown
Owner Author

Review comments addressed in abdad76 (comment-only): the two fusion comments now read "combining mark or conjoining Hangul L jamo (the could_compose predicate)" instead of the broader "conjoining jamo", and transliterate_dispatch's doc now documents that None is returned only as the compose-bail control-flow signal (with detect_compose = true), never as a normal "no output" case.

@raeq raeq merged commit 462f560 into main Jun 21, 2026
21 checks passed
@raeq raeq deleted the perf/fuse-compose-scan branch June 21, 2026 19:44
github-actions Bot added a commit that referenced this pull request Jun 21, 2026
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
raeq added a commit that referenced this pull request Jun 21, 2026
…try into 0.11.0

Two 0.11 release-prep touch-ups per RELEASING.md:
- BINDINGS.md gains the per-binding<->core compatibility note (RELEASING.md 'Across
  languages'): all registries wrap core 0.11 at 0.11, with the per-registry-patch drift
  rule spelled out. Cheap insurance ahead of the npm binding's independent patch lane.
- CHANGELOG: move the transliterate compose-fusion perf entry (#497) out of [Unreleased]
  and into [0.11.0], since 0.11.0 is not yet tagged — it ships in the release.

Assisted-by: Claude Code:claude-opus-4-8
Signed-off-by: Richard Quinn <quinn.richard@gmail.com>
raeq added a commit that referenced this pull request Jun 21, 2026
…g-fold

docs(release): binding↔core compatibility note + fold #497 perf into 0.11.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants