Skip to content

fix(streamable-http): drain SSE response to EOF instead of closing early#2712

Open
whocareyw wants to merge 1 commit into
modelcontextprotocol:mainfrom
whocareyw:fix/streamable-http-drain-sse-to-eof
Open

fix(streamable-http): drain SSE response to EOF instead of closing early#2712
whocareyw wants to merge 1 commit into
modelcontextprotocol:mainfrom
whocareyw:fix/streamable-http-drain-sse-to-eof

Conversation

@whocareyw
Copy link
Copy Markdown

Summary

The streamable-http client closes the SSE response stream immediately after the
JSON-RPC response/error event arrives, before the stream reaches EOF. This happens
in three places:

  • _handle_sse_response (POST response) — await response.aclose(); return
  • _handle_reconnection (resumed GET stream) — await event_source.response.aclose(); return
  • _handle_resumption_request (explicit resumption GET stream) — await event_source.response.aclose(); break

Aborting before EOF returns the underlying keepalive connection to the pool
un-drained. Against most servers this is harmless, but against some it stalls the
next request that reuses the connection by a fixed delay.

This PR drains each SSE stream to its natural EOF instead. The caller is still
unblocked the moment the response event is routed to read_stream; only the
background drain continues. Cancellation/shutdown still tears the stream down
promptly because CancelledError propagates out of aiter_sse() and the
enclosing async with closes the response.

Addresses the early-close sites discussed in #2707.

Observed impact

I hit this against a FastMCP streamable-http server hosted on a background
thread inside a desktop app (Windows). Every serial call_tool paid a fixed
~260 ms penalty; send_ping and list_tools showed the same. Switching the
client to drain-to-EOF dropped it to ~7 ms — a 37× speedup — without touching
the server.

# same server, same shared client, 20 serial call_tool()
early close (current):  avg 265 ms / call
drain to EOF (this PR): avg   7 ms / call

On reproducing it in CI

I was not able to reproduce the latency penalty against the SDK's own test
server (uvicorn + sse-starlette), even after pinning the test environment to
the exact server library versions where I observed the stall
(sse-starlette==3.3.2, starlette==0.52.1, uvicorn==0.34.3, mcp 1.27.1
server). It measured ~4 ms per call with and without this change.

So the stall appears to be an interaction between the client's early-close and a
particular server runtime (in my case the MCP server running on a background
thread co-resident with a GUI event loop), rather than something the standalone
test server exhibits. That means a latency-based acceptance test would not guard
this on CI.

I've therefore framed the change as keepalive hygiene / client robustness:
draining the response body before releasing a pooled connection is the safer
client behavior and doesn't rely on the server gracefully handling an abrupt
mid-stream close. If you can point me at a way to deterministically exercise the
un-drained-connection path in CI, I'm happy to add a regression test.

Testing

  • All existing tests/shared/test_streamable_http.py pass (61/61), including the
    reconnection / resumption / replay / polling paths that this change touches
    (test_streamable_http_client_auto_reconnects,
    test_streamable_http_multiple_reconnections,
    test_streamable_http_events_replayed_after_disconnect,
    test_streamable_http_sse_polling_full_cycle,
    test_standalone_get_stream_reconnection).
  • ruff check / ruff format --check clean on the changed file.

Notes for review

  • The _handle_resumption_request site (third bullet above) wasn't called out
    explicitly in the issue discussion, but it has the identical early-close shape,
    so I applied the same change for consistency. Happy to split it out if you'd
    prefer.
  • Behavior change to be aware of: for a (non-conformant) server that sends the
    response and then holds the SSE stream open indefinitely, the client now keeps
    reading until shutdown cancels it, instead of returning immediately. For
    spec-conformant servers the stream closes right after the response, so EOF is
    immediate (~ms).

The client closed the SSE response stream immediately after the
JSON-RPC response/error event arrived (`await response.aclose()` /
`break`), in three places:
  - _handle_sse_response (POST response)
  - _handle_reconnection (resumed GET stream)
  - _handle_resumption_request (explicit resumption GET stream)

Aborting the stream before EOF leaves the underlying keepalive
connection un-drained. With some servers this stalls the next request
that reuses the connection by a fixed delay (~260ms observed against a
FastMCP streamable-http server hosted on a background thread inside a
desktop app; 37x slower per call).

Drain the stream to its natural EOF instead: the caller is still
unblocked as soon as the response event is routed to read_stream, and
the connection is returned to the pool cleanly. Cancellation/shutdown
still tears the stream down promptly because CancelledError propagates
out of aiter_sse() and the enclosing `async with` closes the response.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant