Skip to content

observability: per-reactor counters (ENOBUFS, overflows, accepts, sheds, pool hit rate) #101

Description

@MDA2AV

Severity: low — but it's the difference between diagnosable and mystery behavior for every pressure-path issue filed alongside this one.

Problem

There are no counters anywhere in the core. Failure modes that currently vanish or go to stderr only:

  • recv -ENOBUFS events (ENOBUFS issue) — today indistinguishable from peer disconnects
  • recv-queue overflow closes — today a bare Console.Error.WriteLine (Connection.Read.cs#L131)
  • CQ overflow — the cq_off.overflow counter is already mmapped via CqRingOffsets but never read
  • accept sheds at the connection cap (gid-exhaustion / hardening issues)

Suggested fix

A plain per-reactor stats struct — reactor-thread-written fields, torn-read-tolerant, zero synchronization:

accepts, active connections, recvs, sends, flushes, ENOBUFS events, recv-queue overflows, CQ overflow (sampled), pool hit/miss, accept sheds, bytes in/out.

Expose as a public snapshot on Reactor plus an optional ticker-driven log line. No histogram machinery needed — counters alone cover the filed issues.

Side benefit: live per-reactor accept/connection counts make SO_REUSEPORT load imbalance directly visible, which is useful both operationally and for benchmarking the placement story.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions