Skip to content

Use MSG_EOR to mark fragment boundaries#444

Merged
jdm merged 1 commit into
servo:mainfrom
nortti0:seqpacket-eor
Apr 2, 2026
Merged

Use MSG_EOR to mark fragment boundaries#444
jdm merged 1 commit into
servo:mainfrom
nortti0:seqpacket-eor

Conversation

@nortti0

@nortti0 nortti0 commented Mar 8, 2026

Copy link
Copy Markdown
Contributor

ipc-channel relies on the fragments being read as atomic chunks, which is not guaranteed to be true for plain recvmsg(2) on SOCK_SEQPACKET cross-platform. Namely, on FreeBSD it's possible for this to return both more and less than a single fragment.

SOCK_SEQPACKET can be thought of as a variant of SOCK_STREAM with the ability to set "end of record" markers using MSG_EOR. A single recvmsg(2) will not read past such a marker. While the fact that the marker has been reached is supposed to be communicated back to the caller through msg_flags, as of now Linux does not properly implement this on Unix-domain sockets.

To handle cases where the recvmsg(2) returns less than a single fragment, include the size of the fragment in the header and continue reads until the entire fragment has been read. Since SOCK_SEQPACKET maintains ordering, as long as sending the fragments is atomic, this will not will not run the danger of interleaving of different messages.

Empirically, sendmsg(2) as it is used by ipc-channel is atomic at least on FreeBSD. To make any possible resultant bugs easier to find, add asserts for that.

@nortti0 nortti0 force-pushed the seqpacket-eor branch 3 times, most recently from 2f49fb2 to fb26b6c Compare March 11, 2026 14:18
@nortti0

nortti0 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor Author

Investigating the failures here has lead me to discover that MSG_EOR is not properly implemented on Linux for Unix-domain sockets. Specifically, unix_seqpacket_sendmsg() delegates to unix_dgram_sendmsg() which doesn't do anything with the flag.

Linux using the dgram logic does mean that it mostly behaves as-if MSG_EOR had been passed, except that it will never signal its presence in msg_flags returned by recvmsg(2). To make it possible to detect that the first fragment has been fully read on both Linux and on systems that properly implement the MSG_EOR semantics on Unix-domain sockets, I extended the header of the fragment to include its own size.

I have also edited the initial pull request message to reflect the current state of the code, as afaiu it will become the final commit message.

@jdm jdm left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for figuring this out!

Comment thread src/platform/unix/mod.rs
// read, and afterwards whether it already got the entire message,
// or needs to receive additional fragments -- and if so, how much.
iovec {
iov_base: &fragment_size as *const _ as *mut c_void,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to use &mut here and skip the intermediate const pointer cast, but we can do that for both iovecs in a followup.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean &raw mut fragment_size as *mut c_void? I can't seem to get the compiler to be happy with a cast directly from a reference to a void pointer.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we probably still need a *mut _ cast in between, but the status quo of casting an immutable reference to a mutable pointer is obviously wrong.

Comment thread src/platform/unix/mod.rs Outdated
Comment thread src/platform/unix/mod.rs Outdated
Comment thread src/platform/unix/mod.rs Outdated
ipc-channel relies on the fragments being read as atomic chunks, which
is not guaranteed to be true for plain recvmsg(2) on SOCK_SEQPACKET
cross-platform. Namely, on FreeBSD it's possible for this to return both
more and less than a single fragment.

SOCK_SEQPACKET can be thought of as a variant of SOCK_STREAM with the
ability to set "end of record" markers using MSG_EOR. A single
recvmsg(2) will not read past such a marker. While the fact that the
marker has been reached is supposed to be communicated back to the
caller through msg_flags, as of now Linux does not properly implement
this on Unix-domain sockets.

To handle cases where the recvmsg(2) returns less than a single
fragment, include the size of the fragment in the header and continue
reads until the entire fragment has been read. Since SOCK_SEQPACKET
maintains ordering, as long as sending the fragments is atomic, this
will not will not run the danger of interleaving of different messages.

Empirically, sendmsg(2) as it is used by ipc-channel is atomic at least
on FreeBSD. To make any possible resultant bugs easier to find, add
asserts for that.

Signed-off-by: Juhani Krekelä <juhani@krekelä.fi>
@nortti0

nortti0 commented Apr 2, 2026

Copy link
Copy Markdown
Contributor Author

Applied the assert_eq!() changes.

@jdm jdm added this pull request to the merge queue Apr 2, 2026
Merged via the queue into servo:main with commit c5325f4 Apr 2, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants