Skip to content

fix(tcpclient): close SSLIOStream on TLS handshake timeout to prevent socket leak#3615

Open
armorbreak001 wants to merge 2 commits into
tornadoweb:masterfrom
armorbreak001:fix/tls-timeout-socket-leak
Open

fix(tcpclient): close SSLIOStream on TLS handshake timeout to prevent socket leak#3615
armorbreak001 wants to merge 2 commits into
tornadoweb:masterfrom
armorbreak001:fix/tls-timeout-socket-leak

Conversation

@armorbreak001

Copy link
Copy Markdown

Summary

When TCPClient.connect() is called with both ssl_options and timeout, a TLS handshake timeout causes the underlying socket to leak. The socket becomes unreachable from the caller and is never closed.

Root cause: IOStream.start_tls() transfers socket ownership to a new SSLIOStream before the TLS handshake completes (the raw socket is extracted, the original stream sets self.socket = None, and an SSLIOStream wrapping the socket is created as a local variable inside start_tls()). When gen.with_timeout fires, the TimeoutError is raised but the wrapped future is not cancelled (by design of gen.with_timeout), leaving the SSLIOStream registered on the IOLoop with no external reference — permanently leaking the file descriptor.

Fix

  • In tcpclient.py: Save the future returned by start_tls() and call tls_future.cancel() on TimeoutError before re-raising.
  • In iostream.py: Register a done callback on the TLS connect future inside start_tls() that closes the SSLIOStream when the future is cancelled, ensuring the socket fd is released and the handler is removed from the IOLoop.

Fixes #3614

… socket leak

When TCPClient.connect() times out during the TLS handshake,
the socket is transferred to an SSLIOStream inside start_tls()
but never cleaned up because gen.with_timeout does not cancel
the wrapped future. This leaves the socket fd permanently leaked.

Save the start_tls future and cancel it on TimeoutError, which
triggers cleanup of the underlying SSLIOStream via a new cancel
callback registered in IOStream.start_tls().

Fixes tornadoweb#3614

@bdarnell bdarnell left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another PR for a socket leak in tcpclient in #3579. It's possible that these have the same root cause.

Comment thread tornado/iostream.py
ssl_stream.read_chunk_size = self.read_chunk_size

def _on_cancel(fut: Future[SSLIOStream]) -> None:
if fut.cancelled():

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a test to make sure we are in fact closing the stream here. I think that when tornado.gen.with_timeout reaches its timeout, the Future is terminated with a TimeoutError, but not cancelled, so this condition is never true (the call to tls_future.cancel() in tcpclient is a no-op because the future has already been terminated).

What we probably want to do here is close the stream in a done_callback if fut.exception() is None.

…out test

The _on_cancel callback in start_tls() calls ssl_stream.close() when
the TLS future is cancelled (timeout). However, _signal_closed() calls
_ssl_connect_future.exception() which raises CancelledError on cancelled
futures, preventing proper cleanup.

Wrap the exception() call in try/except CancelledError to ensure
close() completes successfully even when the SSL connect future was
cancelled.

Add test_tls_handshake_timeout_closes_stream to verify that:
- TimeoutError is raised on TLS handshake timeout
- The SSLIOStream is properly closed (no fd leak)
- Addresses bdarnell's review feedback about needing a test
@armorbreak001

Copy link
Copy Markdown
Author

Thanks again for the review, @bdarnell. I've added a test and also found/fixed a related bug:

New test: test_tls_handshake_timeout_closes_stream — verifies that when the TLS handshake times out, the SSLIOStream is properly closed with no fd leak.

Additional fix: While writing the test, I discovered that _signal_closed() was calling _ssl_connect_future.exception() on a cancelled future, which raises CancelledError. This prevented close() from completing successfully. Added a try/except CancelledError around that call (same pattern already used for the regular future in the same method).

The test passes along with all existing tcpclient tests (27 passed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Socket leak in TCPClient.connect when TLS handshake times out

2 participants