Skip to content

read_xml crashes R when trying to read from an archive::archive_read connection if the connection is closed beforehand #470

@wkumler

Description

@wkumler

I've been handed a database in which one of the fields contains XML documents as compressed blobs (ZIP/PK archives). Rather than writing the file out to disk every time and reading it in using read_xml I tried using the archive package's bindings to libarchive to read it directly. It works great except that I'm trying to close the connections properly afterward and learned that R crashes if I incorrectly close the rawConnection object before calling read_xml.

xml_text <- '<text>Hi</text>'
blob <- charToRaw(xml_text)

blob_con <- rawConnection(blob)
spec_con <- archive::archive_read(blob_con)
close(blob_con)
xml2::read_xml(spec_con)

The error I get when R crashes is:

Error in (function (con, rw = "")  : invalid connection
terminate called after throwing an instance of 'cpp11::unwind_exception'
  what():  std::exception

I honestly don't know whether this is an archive problem or an xml2 problem so happy to repost there, I just don't know enough about the internals to learn where the problem itself is and read_xml is what's currently causing the crash when called.

I'm on R 4.5.1 with xml2 1.3.8 and archive version 1.1.12.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions