Skip to content

Skelf-Research/numaperf

Repository files navigation

numaperf

Crates.io Documentation License: MIT CI MSRV

NUMA-first runtime for latency-critical Rust applications.

numaperf gives you explicit control over memory placement, thread pinning, and work scheduling on NUMA systems. Stop guessing where your data lives and start guaranteeing it.

WebsiteDocumentationSkelf Research

Why numaperf?

On multi-socket servers, memory access latency varies by 2-3x depending on which CPU accesses which memory. Most applications ignore this, leading to unpredictable performance. numaperf makes NUMA a first-class concern:

Approach Limitation
First-touch policy Fragile. Initialization order determines placement. Refactoring breaks locality.
numactl / libnuma Process-level only. No per-region control, C API, no runtime observability.
NUMA-aware allocators Good for small objects, but doesn't address large buffers, scheduling, or cross-node traffic.
numaperf Explicit per-region placement, topology-aware scheduling, cross-node observability, hard-mode enforcement.

Quick Start

cargo add numaperf
use numaperf::{Topology, ScopedPin, NumaRegion, MemPolicy, NodeMask, Prefault};

fn main() -> Result<(), numaperf::NumaError> {
    // Discover NUMA topology
    let topo = Topology::discover()?;
    let node0 = topo.numa_nodes()[0].id();

    // Pin this thread to node 0's CPUs
    let _pin = ScopedPin::to_node(&topo, node0)?;

    // Allocate 1 GB bound to node 0
    let region = NumaRegion::anon(
        1024 * 1024 * 1024,
        MemPolicy::Bind(NodeMask::single(node0)),
        Default::default(),
        Prefault::Touch,
    )?;

    // region.as_mut_slice() is now guaranteed local to node 0
    println!("Allocated {} bytes on node {}", region.len(), node0);
    Ok(())
}

Features

  • Topology Discovery - Query NUMA nodes, CPUs, and inter-node distances at runtime
  • Thread Pinning - RAII-based CPU affinity with ScopedPin
  • Memory Placement - Explicit policies: Bind, Preferred, Interleave, Local
  • Work Scheduling - NumaExecutor with per-node worker pools and configurable work stealing
  • Sharded Data - NumaSharded<T> for per-node data structures, ShardedCounter for lock-free counting
  • Device Locality - Map NICs and NVMe devices to their NUMA nodes
  • Observability - Track locality ratios, generate health reports, identify cross-node traffic
  • Hard Mode - Strict enforcement when you need guarantees, graceful degradation when you don't

Use Cases

Database Engines - Pin buffer pools to specific nodes, schedule queries on data-local workers

Network Processing - Allocate packet buffers on the NIC's local node, process without cross-node copies

Scientific Computing - Partition large arrays across nodes, compute with guaranteed locality

Trading Systems - Eliminate latency variance from NUMA effects with strict pinning and placement

Documentation

Crate Structure

numaperf is organized as a workspace. Use the numaperf facade crate for everything, or pick individual crates:

Crate Purpose
numaperf Facade - re-exports all public APIs
numaperf-topo Topology discovery
numaperf-affinity Thread pinning
numaperf-mem Memory placement
numaperf-sched Work scheduling
numaperf-sharded Per-node data structures
numaperf-io Device locality
numaperf-perf Observability

Platform Support

Platform Support
Linux x86_64 Full
Linux aarch64 Full
macOS Graceful degradation (no NUMA hardware)

License

Licensed under the MIT License.

Contributing

Contributions are welcome! Please see our GitHub repository for:

For support, contact support@skelfresearch.com.


Built with care by Skelf Research.


Part of Skelf Research

numaperf is built by Skelf Research — an independent UK AI research lab publishing production-grade open-source projects.

🌐 Website · 📚 Documentation · 🔬 All projects · 🤗 Hugging Face

Related projects: zviz (container isolation) · gpuemu (GPU kernel correctness) · sigc (the quant's compiler)

Released under MIT / Apache-2.0. © Skelf Research Limited.