Skip to content

S3 connection hangs when S3 Put request times out server side. #291

@nesh170

Description

@nesh170

Hi google friends! Apologies for the trouble!! We found this interesting issue when using tensorstore!! Thanks for taking a look

Summary

When an S3 PUT request is timed out server-side, the TensorStore client can hang for ~30 minutes before retrying. During this window, S3 access logs show zero retries — the request appears to be "in flight" on the client but dead on the
server. The receive of the request is confirmed by s3 access logs.

The ~30-minute gap matches the default OS-level TCP keepalive + retransmission timeout, which is the only thing that eventually forces the stalled socket closed.

Root Cause

2 layers of missing timeout configuration compound each other:

1. S3 driver issues requests with no timeouts (s3_key_value_store.cc:764)

auto future = owner->transport_->IssueRequest(
    request, internal_http::IssueRequestOptions(value_));
// IssueRequestOptions only sets payload; request_timeout and connect_timeout
// remain at their defaults of absl::ZeroDuration().
  1. Zero duration → curl sets nothing (curl_transport.cc:198-205)

if (options.request_timeout > absl::ZeroDuration()) { // never true
handle_.SetOption(CURLOPT_TIMEOUT_MS, ...);
}

CURLOPT_TIMEOUT_MS is never set, so libcurl has no deadline on the transfer.

Workaround

Set environment variables before launching your process:

export TENSORSTORE_CURL_LOW_SPEED_TIME_SECONDS=30
export TENSORSTORE_CURL_LOW_SPEED_LIMIT_BYTES=1

This enables the low-speed watchdog: if fewer than 1 byte/sec is transferred for 30 seconds, libcurl aborts the request and the retry logic takes over.

Suggested fix

  • Expose request timeout and connect timeout to s3 query driver

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions