Skip to content

e2e: fix teardown race in webdartc-closes-data-channel test#44

Merged
nus merged 1 commit into
mainfrom
fix-e2e-close-teardown-race
Jun 8, 2026
Merged

e2e: fix teardown race in webdartc-closes-data-channel test#44
nus merged 1 commit into
mainfrom
fix-e2e-close-teardown-race

Conversation

@nus

@nus nus commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Fixes the intermittent Linux-CI failure of the Scenario 1 close test (webdartc closes the data channel → both peers observe onclose), seen on #43's run: waitFor dcClosed timed out after 10s.

Root cause (a teardown race in the e2e helper, not a library bug)

Closing a data channel is a bidirectional SCTP stream reset (RFC 8831 §6.7):

  1. webdartc sends an Outgoing SSN Reset Request.
  2. Chrome answers with a Re-config Response — this completes webdartc's reset, so our onClose fires — and sends its own reciprocal Outgoing reset request.
  3. Chrome fires its onclose (→ browser dcClosed) only once we answer that reciprocal request.

The offerer helper completed on its own onClose and immediately called pc.close(). The failing run's last inbound chunk was Chrome's Response alone (82 00 00 10 / 00 10 00 0c / …/ 00 00 00 01 = RE-CONFIG → Response param → result=1); Chrome's reciprocal request hadn't arrived yet. By the time it did, webdartc's transport was already torn down, so the response was never sent and Chrome's onclose never fired → dcClosed wait timed out.

On fast runs (macOS/Windows/most Linux) Chrome's Response + reciprocal request arrive together / quickly, so webdartc answers before teardown — hence the flakiness.

Fix

Keep the PeerConnection alive ~2s after our onClose (only on the close path) so the SCTP stack can answer Chrome's reciprocal reset and the bidirectional close completes. Helper-only change — real apps keep the PeerConnection alive past close(). Note: bumping the dcClosed timeout alone would not help, since once the transport is gone the response can never be sent.

Verified: the close test passes repeatedly locally; no library code touched.

🤖 Generated with Claude Code

The Scenario 1 close test flaked on Linux CI (waitFor dcClosed timed out
after 10s). Root cause: the offerer helper closes the PeerConnection
immediately after its own DataChannel.onClose.

Closing a data channel is a bidirectional SCTP stream reset (RFC 8831
§6.7): webdartc sends an Outgoing SSN Reset Request, Chrome answers with a
Re-config Response (which completes webdartc's reset → our onClose fires)
AND sends its own reciprocal Outgoing reset request. Chrome only fires its
onclose once we answer that reciprocal request. The CI log's last inbound
chunk was Chrome's Response alone (0x82 / param 0x0010 / result=1) — its
reciprocal request hadn't arrived yet when our onClose fired and the helper
tore the transport down, so Chrome's request landed on a dead connection
and Chrome's onclose (the test's dcClosed) never came.

Keep the PeerConnection alive ~2s after our onClose (only on the close
path) so the SCTP stack can answer Chrome's reciprocal reset and the
bidirectional close completes. This is a helper-only fix — real apps keep
the PeerConnection alive past close() anyway. Bumping the dcClosed timeout
alone would not help: once the transport is gone the response can never be
sent.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@nus nus merged commit 99615d8 into main Jun 8, 2026
15 checks passed
@nus nus deleted the fix-e2e-close-teardown-race branch June 8, 2026 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant