Add missing timeouts to put.io requests that can hang and stall all downloads (fixes #32)#33
Open
bugrax wants to merge 4 commits into
Open
Add missing timeouts to put.io requests that can hang and stall all downloads (fixes #32)#33bugrax wants to merge 4 commits into
bugrax wants to merge 4 commits into
Conversation
…e#32) fetch only set a connect_timeout and a per-chunk timeout around the byte stream. The req.send() call (connect + waiting for response headers) had no timeout of its own, so if put.io accepted the connection but stalled before sending headers, send() blocked forever. The download worker parked there, the orchestration worker blocked on its done channel, and once the few download workers were stuck the whole process stopped pulling (a restart only cleared it temporarily). Wrap req.send() in a 60s timeout; a stalled request now errors and the existing retry loop resumes it instead of hanging.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a request-level timeout around reqwest’s send() to prevent stalled connections from blocking download workers indefinitely.
Changes:
- Wrap
req.send()intokio::time::timeoutto bound time waiting for response headers. - Add explanatory inline comment describing the stall scenario and retry behavior.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
These three put.io API calls were missing the .timeout() the other put.io calls already have. list_files and url are called from get_download_targets for every transfer; when put.io accepted the connection but stalled, they hung with no timeout, freezing the orchestration worker before any download even started — the real cause of the recurring download stall (the download send() timeout in the previous commit covers a later point in the same path).
Every put.io call built a fresh reqwest::Client, so connections were never reused. Resuming a large account generates targets for every transfer (list_files + url each), and the resulting client/socket churn exhausts file descriptors until requests hang with no progress and no error. Share a single connection-pooled client via OnceLock instead.
Extract the request and stream-idle timeouts into REQUEST_TIMEOUT and STREAM_IDLE_TIMEOUT constants, and include the duration (and target url) in the timeout messages instead of a hard-coded host/duration.
Contributor
Author
|
Addressed (commit 2137007): extracted the two timeouts into |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #32.
Problem
Several put.io HTTP calls could block indefinitely if put.io accepts the connection but stalls before responding (which happens under load), because they had no request timeout — only the shared download client's
connect_timeout.The worst offenders are
list_filesandurl, called fromget_download_targetsfor every transfer. When they hang, the orchestration worker that called them is stuck before any download starts; once the few workers are stuck the whole process stops pulling while transfers pile upCOMPLETEDon put.io. A restart only clears it temporarily.Observed live: after a restart with ~100
COMPLETEDtransfers, a handful loggenerating targetsand then nothing —get_download_targetsis parked insidelist_files/urlwith no timeout.Fix
.timeout(30s)tolist_files,url, andaccount_info(the put.io calls that were missing it; the others already had one).req.send()in a 60stokio::time::timeoutas well, so a stalled download request (afterget_download_targetssucceeds) also fails and is retried/resumed by the existing loop instead of hanging.Together these ensure no single hung put.io request can freeze a worker indefinitely.