fix(streamable-http): drain SSE response to EOF instead of closing early by whocareyw · Pull Request #2712 · modelcontextprotocol/python-sdk

whocareyw · 2026-05-29T02:10:45Z

Summary

The streamable-http client closes the SSE response stream immediately after the
JSON-RPC response/error event arrives, before the stream reaches EOF. This happens
in three places:

_handle_sse_response (POST response) — await response.aclose(); return
_handle_reconnection (resumed GET stream) — await event_source.response.aclose(); return
_handle_resumption_request (explicit resumption GET stream) — await event_source.response.aclose(); break

Aborting before EOF returns the underlying keepalive connection to the pool
un-drained. Against most servers this is harmless, but against some it stalls the
next request that reuses the connection by a fixed delay.

This PR drains each SSE stream to its natural EOF instead. The caller is still
unblocked the moment the response event is routed to read_stream; only the
background drain continues. Cancellation/shutdown still tears the stream down
promptly because CancelledError propagates out of aiter_sse() and the
enclosing async with closes the response.

Addresses the early-close sites discussed in #2707.

Observed impact

I hit this against a FastMCP streamable-http server hosted on a background
thread inside a desktop app (Windows). Every serial call_tool paid a fixed
~260 ms penalty; send_ping and list_tools showed the same. Switching the
client to drain-to-EOF dropped it to ~7 ms — a 37× speedup — without touching
the server.

# same server, same shared client, 20 serial call_tool()
early close (current):  avg 265 ms / call
drain to EOF (this PR): avg   7 ms / call

On reproducing it in CI

I was not able to reproduce the latency penalty against the SDK's own test
server (uvicorn + sse-starlette), even after pinning the test environment to
the exact server library versions where I observed the stall
(sse-starlette==3.3.2, starlette==0.52.1, uvicorn==0.34.3, mcp 1.27.1
server). It measured ~4 ms per call with and without this change.

So the stall appears to be an interaction between the client's early-close and a
particular server runtime (in my case the MCP server running on a background
thread co-resident with a GUI event loop), rather than something the standalone
test server exhibits. That means a latency-based acceptance test would not guard
this on CI.

I've therefore framed the change as keepalive hygiene / client robustness:
draining the response body before releasing a pooled connection is the safer
client behavior and doesn't rely on the server gracefully handling an abrupt
mid-stream close. If you can point me at a way to deterministically exercise the
un-drained-connection path in CI, I'm happy to add a regression test.

Testing

All existing tests/shared/test_streamable_http.py pass (61/61), including the
reconnection / resumption / replay / polling paths that this change touches
(test_streamable_http_client_auto_reconnects,
test_streamable_http_multiple_reconnections,
test_streamable_http_events_replayed_after_disconnect,
test_streamable_http_sse_polling_full_cycle,
test_standalone_get_stream_reconnection).
ruff check / ruff format --check clean on the changed file.

Notes for review

The _handle_resumption_request site (third bullet above) wasn't called out
explicitly in the issue discussion, but it has the identical early-close shape,
so I applied the same change for consistency. Happy to split it out if you'd
prefer.
Behavior change to be aware of: for a (non-conformant) server that sends the
response and then holds the SSE stream open indefinitely, the client now keeps
reading until shutdown cancels it, instead of returning immediately. For
spec-conformant servers the stream closes right after the response, so EOF is
immediate (~ms).

The client closed the SSE response stream immediately after the JSON-RPC response/error event arrived (`await response.aclose()` / `break`), in three places: - _handle_sse_response (POST response) - _handle_reconnection (resumed GET stream) - _handle_resumption_request (explicit resumption GET stream) Aborting the stream before EOF leaves the underlying keepalive connection un-drained. With some servers this stalls the next request that reuses the connection by a fixed delay (~260ms observed against a FastMCP streamable-http server hosted on a background thread inside a desktop app; 37x slower per call). Drain the stream to its natural EOF instead: the caller is still unblocked as soon as the response event is routed to read_stream, and the connection is returned to the pool cleanly. Cancellation/shutdown still tears the stream down promptly because CancelledError propagates out of aiter_sse() and the enclosing `async with` closes the response. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

he-yufeng mentioned this pull request May 29, 2026

fix: drain terminal streamable HTTP responses #2721

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(streamable-http): drain SSE response to EOF instead of closing early#2712

fix(streamable-http): drain SSE response to EOF instead of closing early#2712
whocareyw wants to merge 1 commit into
modelcontextprotocol:mainfrom
whocareyw:fix/streamable-http-drain-sse-to-eof

whocareyw commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

whocareyw commented May 29, 2026

Summary

Observed impact

On reproducing it in CI

Testing

Notes for review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant