Skip to content

Aditya2545/superpoint-tensorrt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

SuperPoint TensorRT Optimization

Optimizing SuperPoint keypoint detector for edge deployment using TensorRT. PyTorch → ONNX → TensorRT FP16, targeting NVIDIA Jetson Orin.

Pipeline

  1. Export SuperPoint PyTorch model to ONNX (export_onnx.py)
  2. Simplify ONNX graph with onnx-simplifier
  3. Compile TensorRT FP32 and FP16 engines (build_engines.py)
  4. Benchmark inference latency (benchmark.py)

Results

Hardware: NVIDIA Tesla T4, CUDA 13.0, TensorRT 10.11

Backend Mean Latency P99 Latency FPS
PyTorch CUDA 14.66 ms 15.28 ms 68
TensorRT FP32 0.61 ms 1.87 ms 1642
TensorRT FP16 0.45 ms 1.42 ms 2203

TensorRT FP16 achieves 32× speedup over PyTorch baseline.

Setup

pip install torch onnx onnx-simplifier tensorrt pycuda
python export_onnx.py        # generates superpoint_simplified.onnx
python build_engines.py      # generates fp32 and fp16 .engine files
python benchmark.py          # runs latency benchmark

About

SuperPoint keypoint detector optimized for edge deployment: PyTorch → ONNX → TensorRT FP16. 32× speedup (14.66ms → 0.45ms, 2203 FPS) on Tesla T4. Targets Jetson Orin Ampere FP16 pipeline for real-time Visual SLAM.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages