Skip to content

fix(tcpclient, iostream): close SSLIOStream on TLS handshake timeout to prevent fd leak#3636

Open
armorbreak001 wants to merge 1 commit into
tornadoweb:masterfrom
armorbreak001:fix/tls-handshake-timeout-socket-leak
Open

fix(tcpclient, iostream): close SSLIOStream on TLS handshake timeout to prevent fd leak#3636
armorbreak001 wants to merge 1 commit into
tornadoweb:masterfrom
armorbreak001:fix/tls-handshake-timeout-socket-leak

Conversation

@armorbreak001

Copy link
Copy Markdown

Problem (tornado#3614)

When TCPClient.connect() is called with both ssl_options and timeout, a TLS handshake timeout causes permanent file descriptor leak.

Root Cause

start_tls() transfers socket ownership in three steps:

  1. Extract raw socket from original IOStreamself.socket = None
  2. Wrap in SSL → create new SSLIOStream (local variable only)
  3. Return Future[SSLIOStream] that resolves when handshake completes

When gen.with_timeout() fires:

  • Caller gets TimeoutError
  • Original stream's .socket is already Noneclose() is no-op
  • The SSLIOStream (holding the real fd) is only reachable through the inner Future
  • gen.with_timeout explicitly does not cancel the wrapped Future
  • SSLIOStream stays on IOLoop forever → fd leaked permanently

Impact

Every timed-out TLS connection leaks one file descriptor. Long-running services that retry connections (e.g. HTTP clients with short timeouts) will eventually hit the OS fd limit (EMFILE / "too many open files").

Fix

Two files, minimal change:

tornado/iostream.py

In start_tls(): store the SSLIOStream as self._tls_stream before returning the future.

self._tls_stream = ssl_stream  # Keep reference for timeout cleanup

tornado/tcpclient.py

In connect(): on TimeoutError during start_tls(), close _tls_stream.

except gen.TimeoutError:
    tls_stream = getattr(stream, "_tls_stream", None)
    if tls_stream is not None:
        tls_stream.close()
        stream._tls_stream = None
    raise

Testing

New test: TestTLSHandshakeTimeoutCleanup.test_tls_handshake_timeout_closes_stream

  • Opens TCP server socket (no TLS accept)
  • Calls TCPClient.connect() with SSL + 0.01s timeout
  • Expects TimeoutError
  • Verifies no fd leaked via /proc/self/fd count

All existing tests pass:

  • tcpclient_test.py: ✅ 27 passed, 4 skipped
  • iostream_test.py: ✅ 142 passed, 61 skipped

Fixes #3614

…to prevent fd leak (tornado#3614)

When TCPClient.connect() times out during the TLS handshake phase,
start_tls() has already transferred socket ownership to a new
SSLIOStream and set the original stream's socket to None. The caller's
TimeoutError handler could only call stream.close(), which is a no-op
since the socket is already gone. Meanwhile the SSLIOStream (holding
the real fd) remains registered on the IOLoop forever — leaking the
file descriptor.

Fix:
- In IOStream.start_tls(): store the new SSLIOStream as self._tls_stream
  so callers can reach it after timeout
- In TCPClient.connect(): on gen.with_timeout TimeoutError during
  start_tls(), close stream._tls_stream to release the fd
- Add regression test: TestTLSHandshakeTimeoutCleanup verifies
  no file descriptor leak after TLS handshake timeout

This is a targeted fix with minimal API surface change. The _tls_stream
attribute is only set during start_tls() and cleared after cleanup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Socket leak in TCPClient.connect when TLS handshake times out

1 participant