Skip to content

opt-in perf experiments: NAPI busy-poll (6.7+) and min-wait batched completions (6.12+) #103

Description

@MDA2AV

Severity: low / enhancement — opt-in performance experiments, both per-reactor and orthogonal to the loop structure.

Two ring-level knobs worth exposing behind ServerConfig flags for A/B benchmarking:

1. NAPI busy-polling (kernel 6.7+)

IORING_REGISTER_NAPI enables per-ring NAPI busy-polling: the reactor burns CPU polling the NIC's completion path instead of sleeping in interrupts, cutting recv-path p99 latency. Natural fit for the dedicated-cores deployment model this runtime already assumes; pointless (and costly) for shared hosts, hence opt-in.

2. Min-wait batched completions (kernel 6.12+)

The loop currently parks with SubmitAndWait(1) — wake on the first CQE. The 6.12 min-timeout wait (liburing io_uring_submit_and_wait_min_timeout; IORING_ENTER_EXT_ARG + getevents-arg plumbing) expresses "wake when ≥ N CQEs are ready, or after τ, whichever first": bounded added latency traded for larger completion batches per tick under moderate load. Under saturation batches are naturally large so it changes nothing; the interesting regime is the middle of the load curve.

Both are a config flag + a few lines in Ring/the loop, and a wrk sweep (RPS + p50/p99 across connection counts) would show whether either earns default-on. Incremental mode already requires 6.12, so the min-wait floor is not a new constraint there.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions