Address Robert's review comments on variant casing#8090
Conversation
Normalize variant keys to lowercase during CLDR deserialization, allowing us to simplify the provider and remove macro transforms. Also document the macro arguments. Co-authored-by: Gemini <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Now that we normalize variant keys during CLDR deserialization, the key/attribute transforms are no longer needed, so we can revert the macro to its original simpler state. Co-authored-by: Gemini <176961590+gemini-code-assist[bot]@users.noreply.github.com>
| let map = HashMap::<String, String>::deserialize(deserializer)?; | ||
| Ok(map | ||
| .into_iter() | ||
| .map(|(k, v)| (k.to_ascii_lowercase(), v)) |
There was a problem hiding this comment.
🤖 This comment is written by an AI agent working with @sffc.
We cannot simply use Variant::try_from_str here because some keys in CLDR might contain alt variant suffixes (e.g., -alt-variant), which are not valid BCP-47 variants as a whole and would fail to parse, causing deserialization to fail.
If we wanted to use Variant for normalization, we would need to split the key, normalize the variant part, and reconstruct it. Here is how we could handle it:
fn normalize_variant_key(k: &str) -> String {
if let Some((var, alt)) = k.split_once("-alt-") {
if let Ok(variant) = Variant::try_from_str(var) {
format!("{}-alt-{}", variant.as_str(), alt)
} else {
k.to_ascii_lowercase()
}
} else {
if let Ok(variant) = Variant::try_from_str(k) {
variant.as_str().to_string()
} else {
k.to_ascii_lowercase()
}
}
}Since to_ascii_lowercase() is a safe fallback and is sufficient for all valid variants (which are just ASCII alphanumeric), we are sticking with it for simplicity.
There was a problem hiding this comment.
Variants should work though? POSIX-alt-foo parses as a Variants and normalizes to posix-alt-foo
There was a problem hiding this comment.
I think it's a coincidence that POSIX-alt-foo parses as a Variants. The -alt-foo could be -alt-x (too short) or -alt-abbreviated (too long).
Given these options:
- Apply
to_ascii_lowercase()to the whole string (current PR) - Parse the first subtag as
Variantand applyto_ascii_lowercase()to the rest of the string (the suggestion above in this thread, with tweaks) - Parse the whole string as
Variants
my preference order is: 1 ~> 2 ~>> 3
Wdyt?
There was a problem hiding this comment.
it should be 2. we also need to error if it doesn't parse as a variant due to length, or because it contains non-ascii characters. you probably even want to store (Variant, String) as the key, if you want to use the Variant -> DataMarkerAttributes constructor later
|
Approach seems better, will let Robert review since it's his feedback |
This PR addresses Robert's review comments on PR #8084.
Instead of converting back and forth between BCP-47 lowercase and CLDR uppercase in the provider using closures, we now normalize the variant keys to lowercase in the CLDR JSON deserialization step in
cldr_serde. This allows us to simplify the provider implementation and completely revert the macro changes from the first PR, as the key/attribute transforms are no longer needed.🤖 This pull request was created by an AI agent working with @sffc.
Changelog
N/A