perf(transliterate): fuse the compose pre-scan into the engine pass by raeq · Pull Request #497 · raeq/disarm

raeq · 2026-06-21T19:12:41Z

Recovers most of the form-invariance transliterate regression the perf-results branch flagged.

Root cause

The compose-at-lookup boundary (#475/#477/#481) added a needs_composition(text) pre-scan that walked every non-ASCII input a second time (UTF-8 decode + is_combining_mark trie lookup per character) before the actual transliteration. So the hot path went from one pass to two. On the perf-results bucket (EPYC 7763 / CPython 3.12.13 / corpus 803c316e) the latin/unidecode speedup ratio fell ~18.3 → ~11.4 across PR #474 → #480, all version-pinned comparators dropping proportionally (so it's disarm, not the comparators). This is the 2026-06-20 review's M-3 finding.

Fix

Detect composition inside the engine's existing decode loop instead of in a separate pass:

transliterate_run takes detect_compose and returns Option. On the fast (borrowed-input) attempt it bails the instant it decodes a combining mark / conjoining jamo — off the scalar it already decoded — and the caller redoes the work over the composed buffer (with detect_compose = false, so a leftover lone mark there doesn't re-trigger). Mark-free input (the common case) now completes in one pass.
needs_composition (still used by the confusables fold) also drops the trie lookup for a could_compose range fast-path over U+0000–058F, where marks live only in two small sub-blocks.

Correctness

Behaviour is identical — the bail fires on exactly the chars needs_composition did. Verified:

Gated exhaustive (tier 3): 16/16 — all 11,172 Hangul syllables, full BMP, all CJK, 15 Indic blocks
Formal invariants: 14, tier-1: 21 binaries, 949 Python transliterate/confusables/form-invariance/surrogate tests
Confirmed pre/post-fusion output identical on ≇/≅ etc.

Measurements (Rust-level micro-bench, no PyO3 dilution; pre → post)

corpus	before	after	gain
latin	6.7	4.5	−33%
cyrillic	13.7	11.0	−20%
mixed	10.0	7.9	−21%
greek	12.9	11.4	−12%

The CI perf-gate will quantify the recovery on the canonical bucket via the comparator ratio (which cancels per-call overhead).

Separate finding (not this PR): the #492 adversarial oracle (hypothesis-marked, dev-tier, excluded from CI) flags that ml_normalize('≇') is non-idempotent (≇ → ≅ → "approximately equal"). Confirmed pre-existing on main (identical pre/post fusion) — a symbol-naming/NFKC fixed-point issue, orthogonal to this change.

🤖 Generated with Claude Code

Recovers most of the form-invariance (#475/#477/#481) transliterate regression. The compose-at-lookup boundary added a `needs_composition(text)` pre-scan that walked the *entire* string a second time (UTF-8 decode + `is_combining_mark` trie lookup per char) before the actual transliteration — so every non-ASCII call paid two full passes where it used to pay one. On the perf-results bucket the latin/unidecode speedup ratio fell ~18 -> ~11 (PR #474 -> #480), all comparators dropping proportionally (it's disarm, not the comparators). This is exactly the 2026-06-20 review's M-3 finding. Fix: detect composition *inside* the engine's existing decode loop instead of in a separate pass. `transliterate_run` now takes `detect_compose` and returns `Option`; on the fast (borrowed-input) attempt it bails the instant it decodes a combining mark or conjoining jamo — off the scalar it already decoded — and the caller redoes the work over the composed buffer (with `detect_compose = false` so a leftover lone mark there doesn't re-trigger). Mark-free input — the common case — now completes in one pass with no second scan. Also replaces the `is_combining_mark` trie lookup in `needs_composition` (still used by the confusables path) with a `could_compose` range fast-path over U+0000–058F, where marks live only in two small sub-blocks. Behaviour is identical (the bail triggers on exactly the chars `needs_composition` did); verified by the full form-invariance suite, the 16 gated exhaustive tests (all Hangul / BMP / CJK / Indic), tier-1 (21 binaries), formal invariants, and 949 Python transliterate/confusables/form-invariance tests. Rust-level micro-bench (no PyO3), pre -> post: latin 6.7 -> 4.5 (-33%), cyrillic 13.7 -> 11.0 (-20%), greek 12.9 -> 11.4 (-12%), mixed 10.0 -> 7.9 (-21%) ns/char. Assisted-by: Claude Code:claude-opus-4-8 Signed-off-by: Richard Quinn <quinn.richard@gmail.com>

Assisted-by: Claude Code:claude-opus-4-8 Signed-off-by: Richard Quinn <quinn.richard@gmail.com>

Copilot

Pull request overview

This PR improves transliterate hot-path performance by eliminating the extra needs_composition(text) pre-scan and instead detecting “compose-required” characters inside the transliteration engine’s existing UTF-8 decode loop, preserving the compose-at-lookup form-invariance behavior while recovering most of the regression introduced by the boundary compose work.

Changes:

Fuse compose detection into the transliteration engine pass: attempt a borrowed-input run first and bail to a composed-buffer rerun only when a compose-triggering scalar is encountered.
Thread a detect_compose flag through transliterate_dispatch/transliterate_run, switching them to return Option<Cow<str>> so the fast pass can signal “needs compose”.
Speed up needs_composition with a new could_compose predicate that adds a cheap range fast-path over U+0000–058F; document the perf recovery in the changelog.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`src/transliterate.rs`	Implements the fused fast-path attempt, adds `detect_compose` plumbing, and makes the engine bail to compose only when needed.
`src/compose.rs`	Adds `could_compose` and updates `needs_composition` to use it, reducing per-scalar overhead in common ranges.
`CHANGELOG.md`	Records the performance regression root cause and the fused-pass recovery.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

…gnal Copilot (#497): say 'conjoining Hangul L jamo' (the precise `could_compose` predicate), not the broader 'conjoining jamo'; and document that `transliterate_dispatch` returns `None` only as the compose-bail signal (with detect_compose=true), never as 'no output'. Comment-only. Assisted-by: Claude Code:claude-opus-4-8 Signed-off-by: Richard Quinn <quinn.richard@gmail.com>

raeq · 2026-06-21T19:41:48Z

Review comments addressed in abdad76 (comment-only): the two fusion comments now read "combining mark or conjoining Hangul L jamo (the could_compose predicate)" instead of the broader "conjoining jamo", and transliterate_dispatch's doc now documents that None is returned only as the compose-bail control-flow signal (with detect_compose = true), never as a normal "no output" case.

Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

…try into 0.11.0 Two 0.11 release-prep touch-ups per RELEASING.md: - BINDINGS.md gains the per-binding<->core compatibility note (RELEASING.md 'Across languages'): all registries wrap core 0.11 at 0.11, with the per-registry-patch drift rule spelled out. Cheap insurance ahead of the npm binding's independent patch lane. - CHANGELOG: move the transliterate compose-fusion perf entry (#497) out of [Unreleased] and into [0.11.0], since 0.11.0 is not yet tagged — it ships in the release. Assisted-by: Claude Code:claude-opus-4-8 Signed-off-by: Richard Quinn <quinn.richard@gmail.com>

…g-fold docs(release): binding↔core compatibility note + fold #497 perf into 0.11.0

raeq added 2 commits June 21, 2026 21:06

docs(changelog): note the transliterate compose-fusion perf recovery

b45b4e1

Assisted-by: Claude Code:claude-opus-4-8 Signed-off-by: Richard Quinn <quinn.richard@gmail.com>

Copilot AI review requested due to automatic review settings June 21, 2026 19:12

raeq enabled auto-merge (squash) June 21, 2026 19:12

Copilot started reviewing on behalf of raeq June 21, 2026 19:13 View session

Copilot AI reviewed Jun 21, 2026

View reviewed changes

Comment thread src/transliterate.rs

Comment thread src/transliterate.rs

Comment thread src/transliterate.rs

github-actions Bot added a commit that referenced this pull request Jun 21, 2026

perf-results: append measurement (PR #497, 450082e)

c299dcb

Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

raeq mentioned this pull request Jun 21, 2026

ml_normalize is non-idempotent on NFKD-decomposable symbols (≇ → ≅ → "approximately equal") #498

Open

raeq merged commit 462f560 into main Jun 21, 2026
21 checks passed

raeq deleted the perf/fuse-compose-scan branch June 21, 2026 19:44

github-actions Bot added a commit that referenced this pull request Jun 21, 2026

perf-results: append measurement (PR #497, 1e951f8)

cfed7dc

Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

raeq mentioned this pull request Jun 21, 2026

docs(release): binding↔core compatibility note + fold #497 perf into 0.11.0 #499

Merged

raeq added a commit that referenced this pull request Jun 21, 2026

Merge pull request #499 from raeq/chore/0.11-compat-note-and-changelo…

a958ee4

…g-fold docs(release): binding↔core compatibility note + fold #497 perf into 0.11.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(transliterate): fuse the compose pre-scan into the engine pass#497

perf(transliterate): fuse the compose pre-scan into the engine pass#497
raeq merged 3 commits into
mainfrom
perf/fuse-compose-scan

raeq commented Jun 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

raeq commented Jun 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

raeq commented Jun 21, 2026

Root cause

Fix

Correctness

Measurements (Rust-level micro-bench, no PyO3 dilution; pre → post)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

raeq commented Jun 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants