Reuse pre-signed URL for multi-threaded downloads by BryanFauble · Pull Request #1407 · Sage-Bionetworks/synapsePythonClient

BryanFauble · 2026-06-12T17:00:21Z

Problem

When downloading a file larger than 8 MB (the multi-threaded download path), the client requests two pre-signed URLs from the /fileHandle/batch endpoint:

In download_by_file_handle, to retrieve file handle metadata (concrete type, content size, MD5, storage location) needed to choose the download strategy. Its preSignedURL is discarded on the multi-threaded path.
Again inside _MultithreadedDownloader (via PresignedUrlProvider._get_pre_signed_info), to actually download the chunks.

The first URL is thrown away, so every large-file download generates a redundant pre-signed URL — doubling the pre-signed URL count and polluting download metrics. (Small/single-threaded downloads and cache hits are unaffected — they already use the URL from step 1, or none at all.)

sequenceDiagram
    actor Client
    participant DBFH as download_by_file_handle
    participant API as /fileHandle/batch
    participant MTD as _MultithreadedDownloader
    participant Provider as PresignedUrlProvider

    Client->>DBFH: download (file > 8 MB)
    DBFH->>API: getFileHandleForDownload (#1)
    API-->>DBFH: metadata + preSignedURL
    Note over DBFH: preSignedURL DISCARDED<br/>(only metadata used)
    DBFH->>MTD: download_from_url_multi_threaded()
    MTD->>Provider: get_info()
    Provider->>API: getFileHandleForDownload (#2)
    API-->>Provider: preSignedURL
    Note over API: 2 pre-signed URLs<br/>requested per download
    loop each 8 MB chunk
        MTD->>Provider: get_info() (reuses cached URL)
    end

Solution

Forward the pre-signed URL already retrieved in step 1 into download_from_url_multi_threaded (the function already accepts a presigned_url parameter, used today by WikiPage). The downloader seeds PresignedUrlProvider._cached_info with it and skips the second batch call.

Crucially, the file handle identifiers (file_handle_id / object_id / object_type) are now also threaded into the DownloadRequest on the seeded-URL path, so the mid-download expiry/refetch flow is preserved: if the seeded URL nears expiry during a long transfer, PresignedUrlProvider.get_info() still refetches a fresh URL exactly as before. Wiki callers omit these identifiers, so their existing (no-refetch) behavior is unchanged.

sequenceDiagram
    actor Client
    participant DBFH as download_by_file_handle
    participant API as /fileHandle/batch
    participant MTD as _MultithreadedDownloader
    participant Provider as PresignedUrlProvider

    Client->>DBFH: download (file > 8 MB)
    DBFH->>API: getFileHandleForDownload (#1)
    API-->>DBFH: metadata + preSignedURL
    Note over DBFH: build PresignedUrlInfo<br/>from the SAME response
    DBFH->>MTD: download_from_url_multi_threaded(presigned_url=..., file_handle_id=...)
    MTD->>Provider: get_info()
    Note over Provider: returns seeded URL<br/>NO second API call
    loop each 8 MB chunk
        MTD->>Provider: get_info() (reuses cached URL)
        alt URL near expiry mid-download
            Provider->>API: getFileHandleForDownload (refetch)
            API-->>Provider: fresh preSignedURL
        end
    end
    Note over API: 1 pre-signed URL<br/>per download (refetch only if expired)

Testing

unit_test_download_functions.py — new tests spy on get_file_handle_for_download_async and download_from_url_multi_threaded to assert the batch endpoint is called exactly once and that the retrieved URL + file handle identifiers are forwarded; plus that the constructed DownloadRequest carries the identifiers (and that the identifier-less wiki path leaves them None).
unit_test_download_async.py — new TestSeededPresignedUrlProvider: a seeded unexpired URL is reused with zero refetch calls; a seeded near-expiry URL triggers exactly one refetch keyed on the correct identifiers (not None).
unit_test_download.py — updated test_multithread_true_s3_file_handle to reflect the new forwarding behavior.
Full regression run of the download, multithread_download, and wiki unit suites: 139 passed. ruff / black / isort clean on changed files.

Multi-threaded downloads (files > 8 MB) requested two pre-signed URLs: one in download_by_file_handle to choose the download strategy, and a second inside the multi-threaded downloader. The first URL's preSignedURL field was discarded, doubling the pre-signed URL count (and download metrics) for every large-file download. Forward the already-retrieved URL into download_from_url_multi_threaded so the downloader reuses it instead of fetching a second one. The file handle identifiers (file_handle_id/object_id/object_type) are now passed alongside the seeded URL so PresignedUrlProvider can still refetch a fresh URL if it expires mid-download. Wiki callers omit these identifiers, preserving their existing no-refetch behavior.

Copilot

Pull request overview

This PR reduces redundant /fileHandle/batch calls on the multi-threaded download path by forwarding the pre-signed URL retrieved during initial file-handle metadata lookup into download_from_url_multi_threaded, while also forwarding file-handle identifiers so the downloader can still refetch a URL mid-download if needed.

Changes:

Seed the multi-threaded downloader with a PresignedUrlInfo built from the initial get_file_handle_for_download_async response (avoids a second pre-signed URL fetch).
Extend the seeded-URL multi-threaded path to include file_handle_id / object_id / object_type in the constructed DownloadRequest to preserve refetch behavior.
Add/adjust unit tests to assert single batch call behavior and correct forwarding/refetch semantics.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
`synapseclient/core/download/download_functions.py`	Forwards initial `preSignedURL` into multi-threaded downloads and threads identifiers into `DownloadRequest`.
`tests/unit/synapseclient/core/unit_test_download.py`	Updates multi-threaded download test to assert the forwarded `PresignedUrlInfo`.
`tests/unit/synapseclient/core/download/unit_test_download_functions.py`	Adds tests asserting the batch endpoint is hit once and identifiers + URL are forwarded into multi-threaded downloads.
`tests/unit/synapseclient/core/download/unit_test_download_async.py`	Adds tests for a seeded `PresignedUrlProvider` reusing unexpired URLs and refetching expired ones using identifiers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        # Pass through the file handle identifiers so the downloader can refetch a
+        # fresh pre-signed URL if the seeded one expires mid-download. Callers that
+        # cannot refetch (e.g. wiki attachments) omit these, leaving them as None.
        request = DownloadRequest(
+            file_handle_id=int(file_handle_id) if file_handle_id is not None else None,


thomasyu888

🔥 Thanks for taking such a quick look at this. My biggest concern is whether this would work for large file downloads, and I think it will.

I'll defer to @linglp / @andrewelamb for final review

linglp · 2026-06-15T22:09:23Z

+        # Pass through the file handle identifiers so the downloader can refetch a
+        # fresh pre-signed URL if the seeded one expires mid-download. Callers that
+        # cannot refetch (e.g. wiki attachments) omit these, leaving them as None.
        request = DownloadRequest(


Could you clarify or add a comment about the asymmetry here? We are doing:

request = DownloadRequest( file_handle_id=int(file_handle_id) if file_handle_id is not None else None, object_id=object_id, object_type=object_type, path=temp_destination, debug=client.debug, presigned_url=presigned_url, )

But in the else statement, I saw:

request = DownloadRequest( file_handle_id=int(file_handle_id), object_id=object_id, object_type=object_type, path=temp_destination, debug=client.debug, )

I am curious why None guard is only needed in the if presigned_url is not None branch? Is it because file_handle_id is guaranteed non-None in else?

linglp

LGTM! Just one question

BryanFauble requested a review from a team as a code owner June 12, 2026 17:00

Copilot AI review requested due to automatic review settings June 12, 2026 17:00

Copilot started reviewing on behalf of BryanFauble June 12, 2026 17:00 View session

Copilot AI reviewed Jun 12, 2026

View reviewed changes

thomasyu888 reviewed Jun 12, 2026

View reviewed changes

linglp reviewed Jun 15, 2026

View reviewed changes

linglp approved these changes Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse pre-signed URL for multi-threaded downloads#1407

Reuse pre-signed URL for multi-threaded downloads#1407
BryanFauble wants to merge 1 commit into
developfrom
synpy-presigned-url-reuse

BryanFauble commented Jun 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

thomasyu888 left a comment

Uh oh!

linglp Jun 15, 2026

Uh oh!

linglp left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

BryanFauble commented Jun 12, 2026

Problem

Solution

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

thomasyu888 left a comment

Choose a reason for hiding this comment

Uh oh!

linglp Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

linglp left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants