Severity: low — latent edge case, likely rare-to-unreachable on modern kernels, but the fix is ~3 lines and matches the reference implementation.
The bookkeeping assumption
The guarantee chain for every op is staged → counted into an enter → consumed by the kernel → CQE. Step two has an assumption:
Ring.SubmitAndWait computes what to submit from its own last-published tail (Ring.cs#L127-L142):
uint published = *_sqTail; // our last-published tail
uint toSubmit = _sqeTail - published;
if (toSubmit > 0) Volatile.Write(ref *_sqTail, _sqeTail); // publish BEFORE enter
return io_uring_enter(_fd, toSubmit, waitFor, flags);
liburing instead derives pending work from the kernel-consumed head: sq_ready = tail − acquire-load(*khead), so entries that were published but not consumed by a previous enter are re-counted on the next call.
Failure scenario
- Tail is published,
io_uring_enter returns -EBUSY/-EAGAIN (e.g. CQ-overflow backpressure) having consumed zero entries. The loop tolerates the errno and continues (Reactor.Loop.SharedRing.cs#L57-L62).
- Next iteration:
published == _sqeTail, so toSubmit = 0 — the stranded entries are in the ring but never counted into any subsequent to_submit.
- They drain only as later staged SQEs bump the count (the kernel consumes FIFO from its head): the 250 ms timer re-arm trickles them out roughly one per tick, each new SQE releasing one stranded op and stranding itself.
Consequence: operations delayed by seconds under exactly the conditions that produce -EBUSY (overload) — the worst possible timing. Not a leak, not a deadlock (the timer guarantees eventual drain), but a latency anomaly that would be near-impossible to diagnose without knowing this mechanism.
Fix
Compute the submit count from the kernel-visible head, liburing-style:
uint khead = Volatile.Read(ref *_sqHead); // kernel-written consumed head
uint toSubmit = _sqeTail - khead; // everything staged and not yet consumed
(publish the tail as today; to_submit then re-covers any stranded entries). Additionally: io_uring_enter's return value is the number consumed — debug-assert it equals toSubmit and count shortfalls in the per-reactor stats (#101).
Notes
Severity: low — latent edge case, likely rare-to-unreachable on modern kernels, but the fix is ~3 lines and matches the reference implementation.
The bookkeeping assumption
The guarantee chain for every op is staged → counted into an enter → consumed by the kernel → CQE. Step two has an assumption:
Ring.SubmitAndWaitcomputes what to submit from its own last-published tail (Ring.cs#L127-L142):liburing instead derives pending work from the kernel-consumed head:
sq_ready = tail − acquire-load(*khead), so entries that were published but not consumed by a previous enter are re-counted on the next call.Failure scenario
io_uring_enterreturns-EBUSY/-EAGAIN(e.g. CQ-overflow backpressure) having consumed zero entries. The loop tolerates the errno and continues (Reactor.Loop.SharedRing.cs#L57-L62).published == _sqeTail, sotoSubmit = 0— the stranded entries are in the ring but never counted into any subsequentto_submit.Consequence: operations delayed by seconds under exactly the conditions that produce
-EBUSY(overload) — the worst possible timing. Not a leak, not a deadlock (the timer guarantees eventual drain), but a latency anomaly that would be near-impossible to diagnose without knowing this mechanism.Fix
Compute the submit count from the kernel-visible head, liburing-style:
(publish the tail as today;
to_submitthen re-covers any stranded entries). Additionally:io_uring_enter's return value is the number consumed — debug-assert it equalstoSubmitand count shortfalls in the per-reactor stats (#101).Notes
IORING_SETUP_CQSIZE) shrinks the main-EBUSYtrigger, andDEFER_TASKRUNon ≥6.1 may make the zero-consumed case unreachable in practice — the fix is still free insurance and aligns the ring with the reference accounting.