This repository demonstrates a compact C++ computer vision inference pipeline built around NVIDIA Triton Inference Server. Its goal is to export a PyTorch object detection model and a scikit-learn classifier to ONNX, serve both through Triton, and run end-to-end inference from a C++ application over a directory of images. Both the Triton models server and the C++ client application are containerized.
External C++ dependencies are included as git submodules.
git submodule update --init --recursiveThe repo includes Python notebooks that generate the example models. Those notebooks produce the bundled model artifacts under mounts/models:
Set up the python environment using uv sync or other package managers.
Code style follows the Google C++ style guide, and the repo is configured for clang-format and clang-tidy.
Building and running requires Docker or a comparable container engine like podman.
Check out the Makefile for further details on the commands below.
Build the application image first:
make docker-buildStart Triton with the bundled repository:
make docker-run-tritonRun the application against a directory of images:
make docker-run APP_ARGS='--image-dir=/app/input --model-name="torch_cnn_model" --classifier-model-name="sklearn_classifier"'The default input and output mounts are mounts/in and mounts/out. Override them with INPUT_DIR and OUTPUT_DIR if needed. The output is a simple JSON containing the classification and detection results, check out src/main.cpp for details.
If you want Triton to use the GPU, run the containers on a GPU-enabled Docker setup and pass --model-cuda-shared-mem=true in APP_ARGS. That flag enables CUDA shared memory in the Triton client; actual GPU access still depends on a host with NVIDIA drivers and the NVIDIA container runtime configured.
Run the unit tests with:
make docker-testTo run only a subset of tests:
GTEST_FILTER='*VisionMapper*' make docker-testThe bundled Triton repository lives in mounts/models and currently contains two example ONNX models. They are dummy models and don't produce any meaningful results, other than validating the end-to-end pipeline.
Triton can have an additional model config like here mounts/models/config.pbtxt.
The code is split into three layers:
- src/main.cpp handles CLI parsing, walks the input image directory, invokes the inference pipeline, and writes the final JSON output.
- src/model_adapter_det.cpp and src/model_adapter_cls.cpp convert raw images into model-specific inputs, call Triton, and post-process outputs into application-level results.
- src/model_client.cpp owns the Triton gRPC client and hides CPU vs CUDA shared-memory transport details.
The detection adapter runs the object detection model directly on resized images. The classifier adapter first extracts a small feature vector in src/feature_extraction.cpp and then sends it to Triton as a second model request.