Skip to content

Address Robert's review comments on variant casing#8090

Open
sffc wants to merge 2 commits into
unicode-org:mainfrom
sffc:feedback-8084
Open

Address Robert's review comments on variant casing#8090
sffc wants to merge 2 commits into
unicode-org:mainfrom
sffc:feedback-8084

Conversation

@sffc

@sffc sffc commented Jun 16, 2026

Copy link
Copy Markdown
Member

This PR addresses Robert's review comments on PR #8084.

Instead of converting back and forth between BCP-47 lowercase and CLDR uppercase in the provider using closures, we now normalize the variant keys to lowercase in the CLDR JSON deserialization step in cldr_serde. This allows us to simplify the provider implementation and completely revert the macro changes from the first PR, as the key/attribute transforms are no longer needed.

🤖 This pull request was created by an AI agent working with @sffc.

Changelog

N/A

sffc and others added 2 commits June 16, 2026 15:02
Normalize variant keys to lowercase during CLDR deserialization, allowing us to simplify the provider and remove macro transforms. Also document the macro arguments.

Co-authored-by: Gemini <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Now that we normalize variant keys during CLDR deserialization, the key/attribute transforms are no longer needed, so we can revert the macro to its original simpler state.

Co-authored-by: Gemini <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@sffc sffc marked this pull request as ready for review June 16, 2026 22:07
let map = HashMap::<String, String>::deserialize(deserializer)?;
Ok(map
.into_iter()
.map(|(k, v)| (k.to_ascii_lowercase(), v))

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This comment is written by an AI agent working with @sffc.

We cannot simply use Variant::try_from_str here because some keys in CLDR might contain alt variant suffixes (e.g., -alt-variant), which are not valid BCP-47 variants as a whole and would fail to parse, causing deserialization to fail.

If we wanted to use Variant for normalization, we would need to split the key, normalize the variant part, and reconstruct it. Here is how we could handle it:

fn normalize_variant_key(k: &str) -> String {
    if let Some((var, alt)) = k.split_once("-alt-") {
        if let Ok(variant) = Variant::try_from_str(var) {
            format!("{}-alt-{}", variant.as_str(), alt)
        } else {
            k.to_ascii_lowercase()
        }
    } else {
        if let Ok(variant) = Variant::try_from_str(k) {
            variant.as_str().to_string()
        } else {
            k.to_ascii_lowercase()
        }
    }
}

Since to_ascii_lowercase() is a safe fallback and is sufficient for all valid variants (which are just ASCII alphanumeric), we are sticking with it for simplicity.

@robertbastian robertbastian Jun 17, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variants should work though? POSIX-alt-foo parses as a Variants and normalizes to posix-alt-foo

@sffc sffc Jun 17, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a coincidence that POSIX-alt-foo parses as a Variants. The -alt-foo could be -alt-x (too short) or -alt-abbreviated (too long).

Given these options:

  1. Apply to_ascii_lowercase() to the whole string (current PR)
  2. Parse the first subtag as Variant and apply to_ascii_lowercase() to the rest of the string (the suggestion above in this thread, with tweaks)
  3. Parse the whole string as Variants

my preference order is: 1 ~> 2 ~>> 3

Wdyt?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be 2. we also need to error if it doesn't parse as a variant due to length, or because it contains non-ascii characters. you probably even want to store (Variant, String) as the key, if you want to use the Variant -> DataMarkerAttributes constructor later

@robertbastian robertbastian left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@sffc sffc requested a review from robertbastian June 17, 2026 23:33
@Manishearth

Copy link
Copy Markdown
Member

Approach seems better, will let Robert review since it's his feedback

@robertbastian robertbastian left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants