fix(tcpclient): close SSLIOStream on TLS handshake timeout to prevent socket leak#3615
fix(tcpclient): close SSLIOStream on TLS handshake timeout to prevent socket leak#3615armorbreak001 wants to merge 2 commits into
Conversation
… socket leak When TCPClient.connect() times out during the TLS handshake, the socket is transferred to an SSLIOStream inside start_tls() but never cleaned up because gen.with_timeout does not cancel the wrapped future. This leaves the socket fd permanently leaked. Save the start_tls future and cancel it on TimeoutError, which triggers cleanup of the underlying SSLIOStream via a new cancel callback registered in IOStream.start_tls(). Fixes tornadoweb#3614
| ssl_stream.read_chunk_size = self.read_chunk_size | ||
|
|
||
| def _on_cancel(fut: Future[SSLIOStream]) -> None: | ||
| if fut.cancelled(): |
There was a problem hiding this comment.
This needs a test to make sure we are in fact closing the stream here. I think that when tornado.gen.with_timeout reaches its timeout, the Future is terminated with a TimeoutError, but not cancelled, so this condition is never true (the call to tls_future.cancel() in tcpclient is a no-op because the future has already been terminated).
What we probably want to do here is close the stream in a done_callback if fut.exception() is None.
…out test The _on_cancel callback in start_tls() calls ssl_stream.close() when the TLS future is cancelled (timeout). However, _signal_closed() calls _ssl_connect_future.exception() which raises CancelledError on cancelled futures, preventing proper cleanup. Wrap the exception() call in try/except CancelledError to ensure close() completes successfully even when the SSL connect future was cancelled. Add test_tls_handshake_timeout_closes_stream to verify that: - TimeoutError is raised on TLS handshake timeout - The SSLIOStream is properly closed (no fd leak) - Addresses bdarnell's review feedback about needing a test
|
Thanks again for the review, @bdarnell. I've added a test and also found/fixed a related bug: New test: Additional fix: While writing the test, I discovered that The test passes along with all existing tcpclient tests (27 passed). |
Summary
When
TCPClient.connect()is called with bothssl_optionsandtimeout, a TLS handshake timeout causes the underlying socket to leak. The socket becomes unreachable from the caller and is never closed.Root cause:
IOStream.start_tls()transfers socket ownership to a newSSLIOStreambefore the TLS handshake completes (the raw socket is extracted, the original stream setsself.socket = None, and an SSLIOStream wrapping the socket is created as a local variable insidestart_tls()). Whengen.with_timeoutfires, theTimeoutErroris raised but the wrapped future is not cancelled (by design ofgen.with_timeout), leaving theSSLIOStreamregistered on the IOLoop with no external reference — permanently leaking the file descriptor.Fix
tcpclient.py: Save the future returned bystart_tls()and calltls_future.cancel()onTimeoutErrorbefore re-raising.iostream.py: Register a done callback on the TLS connect future insidestart_tls()that closes theSSLIOStreamwhen the future is cancelled, ensuring the socket fd is released and the handler is removed from the IOLoop.Fixes #3614