Skip to content
This repository was archived by the owner on Nov 21, 2025. It is now read-only.
This repository was archived by the owner on Nov 21, 2025. It is now read-only.

_download_file_from_run silently does nothing if is_local_rank_zero() returns False #915

@fepegar

Description

@fepegar

This might be dangerous because the way health_azure checks whether is_local_rank_zero might be compatible with Lightning but not with other frameworks, and the files are unexpectedly missing after being "downloaded". This is especially problematic when validate_checksum is True, because one would expect that things are double-checked after downloading.

def _download_file_from_run(
run: Run, filename: str, output_file: Path, validate_checksum: bool = False
) -> Optional[Path]:
"""
Download a single file from an Azure ML Run, optionally validating the content to ensure the file is not
corrupted during download. If running inside a distributed setting, will only attempt to download the file
onto the node with local_rank==0. This prevents multiple processes on the same node from trying to download
the same file, which can lead to errors.
:param run: The AML Run to download associated file for
:param filename: The name of the file as it exists in Azure storage
:param output_file: Local path to which the file should be downloaded
:param validate_checksum: Whether to validate the content from HTTP response
:return: The path to the downloaded file if local rank is zero, else None
"""
if not is_local_rank_zero():
return None
run.download_file(filename, output_file_path=str(output_file), _validate_checksum=validate_checksum)
return output_file

A minimally-invasive change would be logging a warning before returning None. It might also be nice to propagate the paths of the downloaded files up to other functions that call this one. For example, maybe _download_files_from_run and download_files_from_run_id should return the paths of all downloaded files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions