NUMA-first runtime for latency-critical Rust applications.
numaperf gives you explicit control over memory placement, thread pinning, and work scheduling on NUMA systems. Stop guessing where your data lives and start guaranteeing it.
Website • Documentation • Skelf Research
On multi-socket servers, memory access latency varies by 2-3x depending on which CPU accesses which memory. Most applications ignore this, leading to unpredictable performance. numaperf makes NUMA a first-class concern:
| Approach | Limitation |
|---|---|
| First-touch policy | Fragile. Initialization order determines placement. Refactoring breaks locality. |
| numactl / libnuma | Process-level only. No per-region control, C API, no runtime observability. |
| NUMA-aware allocators | Good for small objects, but doesn't address large buffers, scheduling, or cross-node traffic. |
| numaperf | Explicit per-region placement, topology-aware scheduling, cross-node observability, hard-mode enforcement. |
cargo add numaperfuse numaperf::{Topology, ScopedPin, NumaRegion, MemPolicy, NodeMask, Prefault};
fn main() -> Result<(), numaperf::NumaError> {
// Discover NUMA topology
let topo = Topology::discover()?;
let node0 = topo.numa_nodes()[0].id();
// Pin this thread to node 0's CPUs
let _pin = ScopedPin::to_node(&topo, node0)?;
// Allocate 1 GB bound to node 0
let region = NumaRegion::anon(
1024 * 1024 * 1024,
MemPolicy::Bind(NodeMask::single(node0)),
Default::default(),
Prefault::Touch,
)?;
// region.as_mut_slice() is now guaranteed local to node 0
println!("Allocated {} bytes on node {}", region.len(), node0);
Ok(())
}- Topology Discovery - Query NUMA nodes, CPUs, and inter-node distances at runtime
- Thread Pinning - RAII-based CPU affinity with
ScopedPin - Memory Placement - Explicit policies: Bind, Preferred, Interleave, Local
- Work Scheduling -
NumaExecutorwith per-node worker pools and configurable work stealing - Sharded Data -
NumaSharded<T>for per-node data structures,ShardedCounterfor lock-free counting - Device Locality - Map NICs and NVMe devices to their NUMA nodes
- Observability - Track locality ratios, generate health reports, identify cross-node traffic
- Hard Mode - Strict enforcement when you need guarantees, graceful degradation when you don't
Database Engines - Pin buffer pools to specific nodes, schedule queries on data-local workers
Network Processing - Allocate packet buffers on the NIC's local node, process without cross-node copies
Scientific Computing - Partition large arrays across nodes, compute with guaranteed locality
Trading Systems - Eliminate latency variance from NUMA effects with strict pinning and placement
- Getting Started - 5-minute tutorial
- Guides - How-to guides for common tasks
- API Reference - Complete API documentation
- Examples - Annotated code examples
numaperf is organized as a workspace. Use the numaperf facade crate for everything, or pick individual crates:
| Crate | Purpose |
|---|---|
numaperf |
Facade - re-exports all public APIs |
numaperf-topo |
Topology discovery |
numaperf-affinity |
Thread pinning |
numaperf-mem |
Memory placement |
numaperf-sched |
Work scheduling |
numaperf-sharded |
Per-node data structures |
numaperf-io |
Device locality |
numaperf-perf |
Observability |
| Platform | Support |
|---|---|
| Linux x86_64 | Full |
| Linux aarch64 | Full |
| macOS | Graceful degradation (no NUMA hardware) |
Licensed under the MIT License.
Contributions are welcome! Please see our GitHub repository for:
For support, contact support@skelfresearch.com.
Built with care by Skelf Research.
numaperf is built by Skelf Research — an independent UK AI research lab publishing production-grade open-source projects.
🌐 Website · 📚 Documentation · 🔬 All projects · 🤗 Hugging Face
Related projects: zviz (container isolation) · gpuemu (GPU kernel correctness) · sigc (the quant's compiler)
Released under MIT / Apache-2.0. © Skelf Research Limited.