Skip to content

release/2.1: Fix off-by-one errors in CollationSpecialPrimariesValidated#8075

Merged
sffc merged 1 commit into
unicode-org:release/2.1from
sffc:fix-collator-2.1
Jun 19, 2026
Merged

release/2.1: Fix off-by-one errors in CollationSpecialPrimariesValidated#8075
sffc merged 1 commit into
unicode-org:release/2.1from
sffc:fix-collator-2.1

Conversation

@sffc

@sffc sffc commented Jun 12, 2026

Copy link
Copy Markdown
Member

Depends on #8078

On release/2.2: #8081
On main: #8080


This PR fixes a critical off-by-one bug in icu_collator 2.1.x that causes a panic when using AlternateHandling::Shifted with MaxVariable::Currency.

This bug was found by the Unicode conformance testing project (credit to @sven-oly).

Cause

When converting CollationSpecialPrimaries to CollationSpecialPrimariesValidated, the last_primaries vector was incorrectly truncated to MaxVariable::Currency as usize (3) instead of MaxVariable::Currency as usize + 1 (4). This resulted in last_primaries lacking the 4th element (index 3, for Currency), causing a panic (unwrap on None) during comparison when last_primary_for_group was called.
Additionally, compressible_bytes extraction was misaligned by one index and failed the length check, causing it to always fall back to hardcoded defaults.

Fix

Corrected the off-by-one errors in both Collator::try_new_unstable and CollatorBorrowed::try_new in comparison.rs by adding 1 to MaxVariable::Currency as usize where appropriate.

Also bumped icu_collator to 2.1.2. --> this will be done in another PR

Context on main branch

On the main branch, this was refactored. The intermediate CollationSpecialPrimariesValidated struct was removed, and CollationSpecialPrimaries is used directly. The bug is naturally avoided there because the deserialization logic correctly uses MaxVariable::Currency as usize + 1 to split the data:
See: provider.rs:L635 on main

🤖 This pull request was created by an AI agent working with @sffc.

Changelog

icu_collator (2.1.2)

  • Fix panic when using AlternateHandling::Shifted with MaxVariable::Currency (off-by-one in special primaries validation).

Comment thread components/collator/tests/tests.rs Outdated
@sffc sffc changed the title Fix off-by-one errors in CollationSpecialPrimariesValidated 2.1 Branch: Fix off-by-one errors in CollationSpecialPrimariesValidated Jun 12, 2026
@sffc sffc marked this pull request as ready for review June 12, 2026 23:29
@sffc sffc requested review from a team, Manishearth, echeran and hsivonen as code owners June 12, 2026 23:29

@Manishearth Manishearth left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of hard to understand what actually broke here, but that's because I don't understand this part of the collation api

Comment thread components/collator/src/comparison.rs Outdated
let special_primaries = special_primaries.map_project(|csp, _| {
let compressible_bytes = (csp.last_primaries.len()
== MaxVariable::Currency as usize + 16)
== MaxVariable::Currency as usize + 17)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're here we should add a comment fixing the magic numbers

Comment thread components/collator/src/comparison.rs Outdated
.as_maybe_borrowed()?
.as_ule_slice()
.get((MaxVariable::Currency as usize)..)?
.get((MaxVariable::Currency as usize + 1)..)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and just explaining the +1 here

Comment thread components/collator/src/comparison.rs Outdated
.as_slice()
.as_ule_slice()
.split_at(MaxVariable::Currency as usize)
.split_at(MaxVariable::Currency as usize + 1)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, should these be static methods on MaxVariable? after_currency and something different for the 17 one?

Comment thread components/collator/tests/tests.rs Outdated
// TODO: Consider testing ff-Adlm for supplementary-plane tailoring, including contractions

#[test]
fn test_repro_user_bug() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please explain this test

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Done! The test has been renamed to test_shifted_max_variable_currency and a detailed comment has been added to explain what it is verifying.

@Manishearth

Copy link
Copy Markdown
Member

@sffc Also I believe our patch release stance is to merge to main and then copy a patch over ,yes?

@sffc

sffc commented Jun 13, 2026

Copy link
Copy Markdown
Member Author

@sffc Also I believe our patch release stance is to merge to main and then copy a patch over ,yes?

The bug is in 2.1. It is fixed on main due to other refactorings, perhaps #7891

I should check 2.2

@Manishearth

Copy link
Copy Markdown
Member

Ah, okay

Comment thread tools/make/tidy.toml

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also here, why is this needed? if this is needed it should be a separate PR on the release branch ("Fix CI") and not part of this commit, which might be cherry-picked, patched, etc.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread tools/make/bakeddata/src/main.rs Outdated
(
"collator",
icu::collator::provider::MARKERS,
"version = \"2.1.2\"",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no data diff, why are you bumping this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! I missed this. Fixed.

Comment thread components/collator/tests/tests.rs Outdated
Comment on lines +2135 to +2138
let payload: DataPayload<CollationSpecialPrimariesV1> =
provider.load(Default::default()).unwrap().payload;
let csp = payload.get();
assert!(csp.last_primaries.len() > MaxVariable::Currency as usize);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: some condition on the data struct is not a "user bug". this seems to be testing implementation details, why does the constructor call below not suffice as a test?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the root cause seems harmless, but since you labeled this "issue", I assume you consider this blocking, so I will remove this extra assertion.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Done! The extra assertion has been removed, leaving only the constructor and comparison test.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a pinned toolchain, a pinned clippy version, a pinned MSRV, etc., so why does this unrelated code need to change?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow this got patched into the 2.1 branch without the docs. fixing in #8078

@robertbastian robertbastian left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context on 2.2 (main)

In ICU4X 2.2 (main branch), this was refactored.

Main is not 2.2, main is 2.3. This bug exists on 2.2, in fact users that depend on version = "2" will be on 2.2, so this should be fixed first and foremost on the 2.2 release branch, and then backported to other releases if needed (2.1, 2.0). This PR only fixes users that depend on version = "~2.1", which, while the conformance project does that, is a small minority of users.

@robertbastian

Copy link
Copy Markdown
Member

@sffc Also I believe our patch release stance is to merge to main and then copy a patch over ,yes?

The bug is in 2.1. It is fixed on main due to other refactorings, perhaps #7891

I should check 2.2

It would still be good to have the test on main

@sffc sffc force-pushed the fix-collator-2.1 branch from e16fa02 to 5e5ab3d Compare June 15, 2026 22:50
@sffc

sffc commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

The force-push was to make this PR sit cleanly on top of the CI-fixing commits which are moved into #8078

@sffc sffc force-pushed the fix-collator-2.1 branch from 5e5ab3d to 720fcda Compare June 15, 2026 22:53
@sffc sffc changed the title 2.1 Branch: Fix off-by-one errors in CollationSpecialPrimariesValidated release/2.1: Fix off-by-one errors in CollationSpecialPrimariesValidated Jun 15, 2026
@sffc

sffc commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

There are now 4 PRs:

@dpulls

dpulls Bot commented Jun 15, 2026

Copy link
Copy Markdown

🎉 All dependencies have been resolved !

@sffc sffc force-pushed the fix-collator-2.1 branch from 720fcda to b477351 Compare June 15, 2026 23:18
sffc added a commit that referenced this pull request Jun 16, 2026
…ariable::Currency (#8080)

See #8075, #8081

Co-authored-by: Gemini <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@sffc sffc force-pushed the fix-collator-2.1 branch from b477351 to 78ef0a0 Compare June 16, 2026 01:07
@sffc sffc force-pushed the fix-collator-2.1 branch from 78ef0a0 to 90cf32f Compare June 16, 2026 01:27
@sffc sffc force-pushed the fix-collator-2.1 branch from 90cf32f to fbcdd1c Compare June 16, 2026 01:42
sffc added a commit that referenced this pull request Jun 16, 2026
…erialization (#8083)

See #8075

Co-authored-by: Gemini <176961590+gemini-code-assist[bot]@users.noreply.github.com>

@robertbastian robertbastian left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't particularly like the architecture of the constants, but as this is not main I don't care. I have left comments on #8083.

Also bumped icu_collator to 2.1.2. --> this will be done in another PR

This should be done in the same PR, doing this in a different PR is extra work for downstream users if they want to patch this fix.

@sffc

sffc commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

This should be done in the same PR, doing this in a different PR is extra work for downstream users if they want to patch this fix.

I thought we decoupled fixes from the Cargo.toml diff, since people who vendor don't particularly care about the Cargo.toml diff, and it makes it harder to apply the same diff across multiple versions (like 2.1 and 2.2)?

@sffc

sffc commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

I'll update this after #8089 lands.

Fixes a critical off-by-one bug in icu_collator 2.1.x that causes a panic when using AlternateHandling::Shifted with MaxVariable::Currency.

When converting CollationSpecialPrimaries to CollationSpecialPrimariesValidated, the last_primaries vector was incorrectly truncated to MaxVariable::Currency as usize (3) instead of MaxVariable::Currency as usize + 1 (4). This resulted in last_primaries lacking the 4th element (index 3, for Currency), causing a panic (unwrap on None) during comparison when last_primary_for_group was called.
Additionally, compressible_bytes extraction was misaligned by one index and failed the length check, causing it to always fall back to hardcoded defaults.

This commit squashes the fix and the regression test into a single clean commit.

Co-authored-by: Gemini <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@sffc sffc force-pushed the fix-collator-2.1 branch from fbcdd1c to d7aea51 Compare June 18, 2026 01:28
@sffc sffc requested a review from robertbastian June 18, 2026 01:32
@sffc sffc merged commit 491bfab into unicode-org:release/2.1 Jun 19, 2026
29 checks passed
@sffc sffc deleted the fix-collator-2.1 branch June 19, 2026 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants