Skip to content

UTF-8 files are not decoded before searching #396

@karenetheridge

Description

@karenetheridge
$ cat 'fooಠbar' > test
$ ack 'foo.bar' test
# ack 'foo..bar' test
$ ack 'foo...bar' test
fooಠbar

It would appear that the file is treated as UTF-8 bytes, so attempts to match a non-ascii character against . are not successful if the utf8 representation uses more than one octet.

Should files be assumed to be UTF-8 encoded? Or perhaps just when characters are seen that are not in the [\x00-\x7F] range? Or should a command-line argument be required to specify the encoding?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions