Skip to content

ring: compute toSubmit from the kernel-consumed SQ head (liburing-style) — EBUSY can strand published SQEs #107

Description

@MDA2AV

Severity: low — latent edge case, likely rare-to-unreachable on modern kernels, but the fix is ~3 lines and matches the reference implementation.

The bookkeeping assumption

The guarantee chain for every op is staged → counted into an enter → consumed by the kernel → CQE. Step two has an assumption:

Ring.SubmitAndWait computes what to submit from its own last-published tail (Ring.cs#L127-L142):

uint published = *_sqTail;            // our last-published tail
uint toSubmit  = _sqeTail - published;
if (toSubmit > 0) Volatile.Write(ref *_sqTail, _sqeTail);   // publish BEFORE enter
return io_uring_enter(_fd, toSubmit, waitFor, flags);

liburing instead derives pending work from the kernel-consumed head: sq_ready = tail − acquire-load(*khead), so entries that were published but not consumed by a previous enter are re-counted on the next call.

Failure scenario

  1. Tail is published, io_uring_enter returns -EBUSY/-EAGAIN (e.g. CQ-overflow backpressure) having consumed zero entries. The loop tolerates the errno and continues (Reactor.Loop.SharedRing.cs#L57-L62).
  2. Next iteration: published == _sqeTail, so toSubmit = 0 — the stranded entries are in the ring but never counted into any subsequent to_submit.
  3. They drain only as later staged SQEs bump the count (the kernel consumes FIFO from its head): the 250 ms timer re-arm trickles them out roughly one per tick, each new SQE releasing one stranded op and stranding itself.

Consequence: operations delayed by seconds under exactly the conditions that produce -EBUSY (overload) — the worst possible timing. Not a leak, not a deadlock (the timer guarantees eventual drain), but a latency anomaly that would be near-impossible to diagnose without knowing this mechanism.

Fix

Compute the submit count from the kernel-visible head, liburing-style:

uint khead    = Volatile.Read(ref *_sqHead);   // kernel-written consumed head
uint toSubmit = _sqeTail - khead;              // everything staged and not yet consumed

(publish the tail as today; to_submit then re-covers any stranded entries). Additionally: io_uring_enter's return value is the number consumed — debug-assert it equals toSubmit and count shortfalls in the per-reactor stats (#101).

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingseverity:lowPolish / minor win

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions