Deep Learning Inference benchmark. Supports OpenVINO™ toolkit, TensorFlow, TensorFlow Lite, ONNX Runtime, OpenCV DNN, MXNet, PyTorch, Apache TVM, ncnn, PaddlePaddle, etc.
-
Updated
May 12, 2026 - HTML
Deep Learning Inference benchmark. Supports OpenVINO™ toolkit, TensorFlow, TensorFlow Lite, ONNX Runtime, OpenCV DNN, MXNet, PyTorch, Apache TVM, ncnn, PaddlePaddle, etc.
First public benchmark of llama.cpp speculative decoding on Qwen3.6-35B-A3B with a single RTX 3090 (post PR #19493 merge, 2026-04-19). 19 configurations covering ngram-cache, ngram-mod, and classic draft with vocab-matched Qwen3.5-0.8B. Finding: no variant achieves net speedup on Ampere + A3B MoE. Raw JSON, plots, full reproducibility.
Benchmark your GPU against any GGUF model and contribute to the public leaderboard. Measures throughput, TTFT, ITL, and VRAM limits across quantizations and context sizes.
Local benchmarking tool to explore Vision Transformer scaling and WSI-level inference constraints under real hardware settings.
Local-first Edge AI inference run registry and comparability checker for multi-target benchmark evidence.
Benchmark speculative decoding performance for Qwen3.6-35B-A3B on an RTX 3090 GPU using llama.cpp to evaluate model throughput and structural regressions.
Add a description, image, and links to the inference-benchmark topic page so that developers can more easily learn about it.
To associate your repository with the inference-benchmark topic, visit your repo's landing page and select "manage topics."