inference-benchmark

Here are 7 public repositories matching this topic...

itlab-vision / dl-benchmark

Deep Learning Inference benchmark. Supports OpenVINO™ toolkit, TensorFlow, TensorFlow Lite, ONNX Runtime, OpenCV DNN, MXNet, PyTorch, Apache TVM, ncnn, PaddlePaddle, etc.

Updated May 12, 2026
HTML

thc1006 / qwen3.6-speculative-decoding-rtx3090

Star

First public benchmark of llama.cpp speculative decoding on Qwen3.6-35B-A3B with a single RTX 3090 (post PR #19493 merge, 2026-04-19). 19 configurations covering ngram-cache, ngram-mod, and classic draft with vocab-matched Qwen3.5-0.8B. Finding: no variant achieves net speedup on Ampere + A3B MoE. Raw JSON, plots, full reproducibility.

benchmark cuda moe ampere mixture-of-experts inference-benchmark llama-cpp ggml local-llm llm-inference qwen speculative-decoding qwen3 rtx-3090

Updated May 16, 2026
Python

aidamian / cpu_serving

Star

containerization inference-engine mlops inference-benchmark llamacpp vllm-serve cpu-serving

Updated Nov 6, 2025
Python

paulplee / poor-pauls-benchmark

Star

Benchmark your GPU against any GGUF model and contribute to the public leaderboard. Measures throughput, TTFT, ITL, and VRAM limits across quantizations and context sizes.

quantization vram gpu-benchmark huggingface home-lab inference-benchmark llm llama-cpp local-llm poor-pauls-benchmark

Updated May 11, 2026
Python

AyushChaurasia18 / vit-wsi-feature-exctraction

Star

Local benchmarking tool to explore Vision Transformer scaling and WSI-level inference constraints under real hardware settings.

streamlit vision-transformer inference-benchmark histopathology-image-analysis

Updated Feb 18, 2026
Python

gwonxhj / InferEdgeEnv

Star

Local-first Edge AI inference run registry and comparability checker for multi-target benchmark evidence.

python raspberry-pi benchmarking sqlite odroid typer jetson tensorrt onnx edge-ai experiment-tracking onnxruntime edge-inference inference-benchmark ai-inference

Updated May 15, 2026
Python

laylazaes-beep / qwen3.6-speculative-decoding-rtx3090

Star

Benchmark speculative decoding performance for Qwen3.6-35B-A3B on an RTX 3090 GPU using llama.cpp to evaluate model throughput and structural regressions.

benchmark cuda moe ampere mixture-of-experts inference-benchmark llama-cpp ggml local-llm llm-inference qwen speculative-decoding qwen3 rtx-3090

Updated May 17, 2026

Improve this page

Add a description, image, and links to the inference-benchmark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the inference-benchmark topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference-benchmark

Here are 7 public repositories matching this topic...

itlab-vision / dl-benchmark

thc1006 / qwen3.6-speculative-decoding-rtx3090

aidamian / cpu_serving

paulplee / poor-pauls-benchmark

AyushChaurasia18 / vit-wsi-feature-exctraction

gwonxhj / InferEdgeEnv

laylazaes-beep / qwen3.6-speculative-decoding-rtx3090

Improve this page

Add this topic to your repo