Skip to content

Automate zipfile extraction #23

@cutright

Description

@cutright

Would be helpful to port the automated zip file extraction code from the previous version of this repo:

https://github.com/cutright/IMRT-QA-Data-Miner/blob/85abf9dc66a139c02574c386377f46f0944c5893/IQDM/utilities.py#L190-L208

def extract_files_from_zipped_files(init_directory, extract_to_path, extension='.pdf'):
    """
    Function to extract .pdf files from zipped files
    :param init_directory: initial top-level directory to walk through
    :type init_directory: str
    :param extract_to_path: directory to extract pdfs into
    :type extract_to_path: str
    :param extension: file extension of file type to extract, set to None to extract all files
    :type extension: str or None
    """
    for dirName, subdirList, fileList in walk(init_directory):  # iterate through files and all sub-directories
        for fileName in fileList:
            if splitext(fileName)[1].lower == '.zip':
                zip_file_path = join(dirName, fileName)
                with zipfile.ZipFile(zip_file_path, 'r') as z:
                    for file_name in z.namelist():
                        if not isdir(file_name) and (extension is None or splitext(file_name)[1].lower == extension):
                            temp_path = join(extract_to_path)
                            z.extract(file_name, path=temp_path)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions