Skip to content

GraphemeCursor::is_boundary returns wrong value inside emoji when chunked one codepoint at a time #139

@pfgithub

Description

@pfgithub

The first time, the chunk containing both [man] and [zwj] is passed into provideContext at the same time. This works and returns the expected result. The second time, the chunk containing [zwj] is provided first, and then the chunk containing [man] is provided. This doesn't work and returns 'true' as if there is a boundary in the middle of an emoji.

#[cfg(test)]
mod tests {
    use unicode_segmentation::{GraphemeCursor, GraphemeIncomplete::*};

    const family_emoji: &str = "A\u{1F468}\u{200D}\u{1F469}\u{1F467}B";
    // "A👨‍👩‍👧‍👧B" : [0: A] [1: MAN] [5: Zero Width Joiner] [8: WOMAN] [12: GIRL] [16: B]

    #[test]
    fn passes() {
        let mut cursor = GraphemeCursor::new(8, family_emoji.len(), true);
        assert_eq!(cursor.is_boundary(&family_emoji[8..], 8), Err(PreContext(8)));
        cursor.provide_context(&family_emoji[1..8], 1);
        assert_eq!(cursor.is_boundary(&family_emoji[8..], 8), Ok(false));
    }

    #[test]
    fn fails() {
        let mut cursor = GraphemeCursor::new(8, family_emoji.len(), true);
        assert_eq!(cursor.is_boundary(&family_emoji[8..], 8), Err(PreContext(8)));
        cursor.provide_context(&family_emoji[5..8], 5);
        assert_eq!(cursor.is_boundary(&family_emoji[8..], 8), Err(PreContext(5)));
        cursor.provide_context(&family_emoji[1..5], 1);
        assert_eq!(cursor.is_boundary(&family_emoji[8..], 8), Ok(false));
    }
}

Potentially related: #118

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions