Skip to content

Opening files and encoding #20

@severinsimmler

Description

@severinsimmler

Opening the file here is useless:

with open(file, "r"):

because lxml itself opens it there:

xml = etree.parse(file)

Cf.:

>>> help(lxml.etree.parse)
parse(source, parser=None, *, base_url=None)

Return an ElementTree object loaded with source elements.  If no parser
is provided as second argument, the default parser is used.
  
The ``source`` can be any of the following:
    
- a file name/path
- a file object
- a file-like object
- a URL using the HTTP or FTP protocol

And you should also specify encoding explicitly, especially here:

with open(outfile,"w") as output:

I'm quoting from the documentation:

In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.

AFAIK, Windows does not use UTF-8 here. This might lead to problems.

Thanks in any case for the TEI Xpath expressions 🙂

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions