Skip to content

sctp: discard pre-established DATA so webdartc↔webdartc data channels open#46

Merged
nus merged 1 commit into
mainfrom
fix-dcep-open-race
Jun 9, 2026
Merged

sctp: discard pre-established DATA so webdartc↔webdartc data channels open#46
nus merged 1 commit into
mainfrom
fix-dcep-open-race

Conversation

@nus

@nus nus commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Fixes a real functional gap found during the bufferedAmount work: a data channel opened webdartc↔webdartc never reached open — the opener's onOpen never fired and readyState stayed connecting. (All existing DC e2e tests use a browser, so this went unnoticed.)

Root cause — a DCEP ACK lost to an SCTP establishment race

  1. The server peer establishes on COOKIE-ECHO and immediately sends the DCEP DATA_CHANNEL_OPEN, which overtakes the COOKIE-ACK and reaches the client still in COOKIE-ECHOED.
  2. _handleData processed that DATA regardless of state and fired onDataChannelOpen, but the DCEP ACK reply failed sendData's established guard and was silently dropped — yet the OPEN was still SACKed, so the sender never retransmitted. Permanent stall.

How libwebrtc/pion avoid this: pion's handleData returns early when state != established (RFC 4960 §5.1) — it discards the DATA without SACKing, so the sender's T3-rtx retransmits it after the receiver establishes.

Fix

  • _handleData (state_machine.dart) discards DATA received before established and does not SACK it — matching RFC 4960 §5.1 / pion. Correctness is now independent of chunk ordering (the peer's T3-rtx recovers).
  • Ordering optimization (peer_connection.dart): the DCEP OPEN send is deferred to a scheduleMicrotask so the COOKIE-ACK is flushed to the wire first — avoids the ~3s retransmit wait in the common case.

The two are layered deliberately: the guard is the load-bearing correctness fix; the microtask is a latency nicety.

Tests

  • New webdartc↔webdartc loopback test (data_channel_open_test.dart): both peers fire onOpen/ondatachannel and a message round-trips — completes in <1s.
  • dart analyze clean; 665 unit + 22 e2e (Chrome/Firefox) pass; closes the BACKLOG item. A latent follow-up (gate _handleSack/_handleHeartbeat/_handleReconfig pre-established, pion parity) is recorded in BACKLOG.

Reviewed with /simplify.

🤖 Generated with Claude Code

… open

A data channel opened webdartc↔webdartc never reached `open` — the opener's
onOpen never fired (readyState stuck at connecting). Root cause was a DCEP
ACK lost to an SCTP establishment race:

- The server peer establishes on COOKIE-ECHO and immediately sends the DCEP
  DATA_CHANNEL_OPEN; it overtakes the COOKIE-ACK and reaches the client
  still in COOKIE-ECHOED.
- `_handleData` processed that DATA regardless of state and fired
  onDataChannelOpen, but the DCEP ACK reply failed `sendData`'s
  `established` guard and was silently dropped — yet the OPEN was still
  SACKed, so the sender never retransmitted. Permanent stall.

Fix, matching RFC 4960 §5.1 and pion/sctp: `_handleData` now discards DATA
received before the association is established (and does NOT SACK it), so
the peer's T3-rtx retransmits it once we reach `established`. To avoid the
~3s retransmit wait in the common case, the DCEP OPEN send is deferred to a
microtask so the COOKIE-ACK is flushed to the wire first.

Tested: a new webdartc↔webdartc loopback test opens a channel (both peers
fire onOpen / ondatachannel) and exchanges a message — completes in <1s.
Full suite + e2e (Chrome/Firefox) pass; closes the BACKLOG item.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@nus nus merged commit 74b88d5 into main Jun 9, 2026
15 checks passed
@nus nus deleted the fix-dcep-open-race branch June 9, 2026 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant