Connection can block indefinitely on a stalled send (no send_timeout), defeating heartbeat self-heal and growing the process heap until OOM

Related to #75 — that issue is the read-side / handshake variant of the same underlying problem ("the `Slipstream.Connection` GenServer wedges on a blocking I/O op with no timeout and never self-heals"). This one is the **write side**, on an already-established connection.

### What happens

`Slipstream.Connection` sends synchronously with no send timeout: `Impl.push_message/2` → [`Mint.WebSocket.stream_request_body/3`](https://github.com/CuatroElixir/slipstream/blob/v1.2.2/lib/slipstream/connection/impl.ex#L51-L66) → `:ssl.send/2`. The heartbeat takes the exact same path ([`push_heartbeat/1` → `push_message/2`](https://github.com/CuatroElixir/slipstream/blob/v1.2.2/lib/slipstream/connection/impl.ex#L90-L98)).

On a **half-open** connection (peer stops ACKing — a brief network blip, a dead load-balancer path, an overloaded server), the kernel TCP send buffer fills and `:ssl.send/2` blocks **indefinitely**. The Connection GenServer is now stuck inside that send and can't process its next message. Two consequences:

1. **The heartbeat self-heal never fires.** The "previous heartbeat still unacked → close and reconnect" check runs on the *next* `SendHeartbeat` tick ([pipeline.ex#L229-L238](https://github.com/CuatroElixir/slipstream/blob/v1.2.2/lib/slipstream/connection/pipeline.ex#L229-L238)), but the blocked process can't reach the next tick. The one mechanism designed to recover a dead connection is starved by the very condition it exists to detect.

2. **The process heap grows without bound.** Every message delivered to the blocked Connection (PubSub, pushes, timers) is copied **on-heap** — a blocked GenServer never reaches its `receive`, so nothing is moved off-heap. On an idle-ish multiplexed socket we watched a node climb from a flat baseline to ~4.6 GB in ~20 s and get OOM-killed, with the Connection process's own heap at ~2.6 GB, blocked in `:ssl.send/2 → :tls_sender.call/2`. The stalled send was the **30 s heartbeat**, not an application push — so no app traffic is needed to trigger it.

### Reproduce

Same spirit as #75, write-side: connect and complete the WebSocket upgrade against a peer that then **stops reading** (so the send buffer fills), and keep sending (the heartbeat alone is enough). The Connection process blocks in `stream_request_body/3`, its heap and mailbox grow, and it never reconnects. A `nc`-style listener that accepts, completes the upgrade, then stops draining reproduces it with `test_mode` off.

### Works around it today

Slipstream already forwards `mint_opts` to [`Mint.HTTP.connect`](https://github.com/CuatroElixir/slipstream/blob/v1.2.2/lib/slipstream/connection/impl.ex#L21), so a bounded send timeout can be passed straight through to the transport:

```elixir
connect(socket,
  uri: uri,
  mint_opts: [transport_opts: [send_timeout: 15_000, send_timeout_close: true]]
)
```

`send_timeout` bounds the `:ssl.send`; on expiry it returns `{:error, :timeout}` → `push_message` returns `{:error, _, reason}` → Slipstream routes [`%Events.ChannelClosed{reason: {:send_failure, reason}}`](https://github.com/CuatroElixir/slipstream/blob/v1.2.2/lib/slipstream/connection/pipeline.ex#L241) → `handle_disconnect/2` → `reconnect/1`. The self-heal is restored and the heap can't run away.

### Suggested fix

Either:

- **(a) Document** the `mint_opts: [transport_opts: [send_timeout: …, send_timeout_close: true]]` hardening prominently, and note that **without it the heartbeat-timeout self-heal does not protect against a stalled send** (a blocked send starves the detector). Cheapest, zero compatibility risk; or
- **(b) Default** a sane `send_timeout` on the transport (perhaps derived from the heartbeat interval) so connections self-heal on a half-open socket out of the box.

Related: #75 (read-side / handshake twin) and #74 (send buffer) share the "bound the Connection's blocking I/O" theme.

**We're happy to open a PR — would you prefer the docs change (a) or the defaulted `send_timeout` (b)?** Let us know and we'll send it.

---

Env: slipstream 1.2.2, mint_web_socket 1.0.5, Erlang/OTP 29.0.2, Elixir 1.20.2-otp-29.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Connection can block indefinitely on a stalled send (no send_timeout), defeating heartbeat self-heal and growing the process heap until OOM #85

What happens

Reproduce

Works around it today

Suggested fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Connection can block indefinitely on a stalled send (no send_timeout), defeating heartbeat self-heal and growing the process heap until OOM #85

Description

What happens

Reproduce

Works around it today

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions