fix(streamable-http): drain SSE response to EOF instead of closing early#2712
Open
whocareyw wants to merge 1 commit into
Open
fix(streamable-http): drain SSE response to EOF instead of closing early#2712whocareyw wants to merge 1 commit into
whocareyw wants to merge 1 commit into
Conversation
The client closed the SSE response stream immediately after the JSON-RPC response/error event arrived (`await response.aclose()` / `break`), in three places: - _handle_sse_response (POST response) - _handle_reconnection (resumed GET stream) - _handle_resumption_request (explicit resumption GET stream) Aborting the stream before EOF leaves the underlying keepalive connection un-drained. With some servers this stalls the next request that reuses the connection by a fixed delay (~260ms observed against a FastMCP streamable-http server hosted on a background thread inside a desktop app; 37x slower per call). Drain the stream to its natural EOF instead: the caller is still unblocked as soon as the response event is routed to read_stream, and the connection is returned to the pool cleanly. Cancellation/shutdown still tears the stream down promptly because CancelledError propagates out of aiter_sse() and the enclosing `async with` closes the response. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The streamable-http client closes the SSE response stream immediately after the
JSON-RPC response/error event arrives, before the stream reaches EOF. This happens
in three places:
_handle_sse_response(POST response) —await response.aclose(); return_handle_reconnection(resumed GET stream) —await event_source.response.aclose(); return_handle_resumption_request(explicit resumption GET stream) —await event_source.response.aclose(); breakAborting before EOF returns the underlying keepalive connection to the pool
un-drained. Against most servers this is harmless, but against some it stalls the
next request that reuses the connection by a fixed delay.
This PR drains each SSE stream to its natural EOF instead. The caller is still
unblocked the moment the response event is routed to
read_stream; only thebackground drain continues. Cancellation/shutdown still tears the stream down
promptly because
CancelledErrorpropagates out ofaiter_sse()and theenclosing
async withcloses the response.Addresses the early-close sites discussed in #2707.
Observed impact
I hit this against a FastMCP
streamable-httpserver hosted on a backgroundthread inside a desktop app (Windows). Every serial
call_toolpaid a fixed~260 ms penalty;
send_pingandlist_toolsshowed the same. Switching theclient to drain-to-EOF dropped it to ~7 ms — a 37× speedup — without touching
the server.
On reproducing it in CI
I was not able to reproduce the latency penalty against the SDK's own test
server (
uvicorn+sse-starlette), even after pinning the test environment tothe exact server library versions where I observed the stall
(
sse-starlette==3.3.2,starlette==0.52.1,uvicorn==0.34.3,mcp1.27.1server). It measured ~4 ms per call with and without this change.
So the stall appears to be an interaction between the client's early-close and a
particular server runtime (in my case the MCP server running on a background
thread co-resident with a GUI event loop), rather than something the standalone
test server exhibits. That means a latency-based acceptance test would not guard
this on CI.
I've therefore framed the change as keepalive hygiene / client robustness:
draining the response body before releasing a pooled connection is the safer
client behavior and doesn't rely on the server gracefully handling an abrupt
mid-stream close. If you can point me at a way to deterministically exercise the
un-drained-connection path in CI, I'm happy to add a regression test.
Testing
tests/shared/test_streamable_http.pypass (61/61), including thereconnection / resumption / replay / polling paths that this change touches
(
test_streamable_http_client_auto_reconnects,test_streamable_http_multiple_reconnections,test_streamable_http_events_replayed_after_disconnect,test_streamable_http_sse_polling_full_cycle,test_standalone_get_stream_reconnection).ruff check/ruff format --checkclean on the changed file.Notes for review
_handle_resumption_requestsite (third bullet above) wasn't called outexplicitly in the issue discussion, but it has the identical early-close shape,
so I applied the same change for consistency. Happy to split it out if you'd
prefer.
response and then holds the SSE stream open indefinitely, the client now keeps
reading until shutdown cancels it, instead of returning immediately. For
spec-conformant servers the stream closes right after the response, so EOF is
immediate (~ms).