Vision Inference on Edge

This repository demonstrates a compact C++ computer vision inference pipeline built around NVIDIA Triton Inference Server. Its goal is to export a PyTorch object detection model and a scikit-learn classifier to ONNX, serve both through Triton, and run end-to-end inference from a C++ application over a directory of images. Both the Triton models server and the C++ client application are containerized.

Setup

External C++ dependencies are included as git submodules.

git submodule update --init --recursive

The repo includes Python notebooks that generate the example models. Those notebooks produce the bundled model artifacts under mounts/models:

Set up the python environment using uv sync or other package managers.

Code style follows the Google C++ style guide, and the repo is configured for clang-format and clang-tidy.

Building and running requires Docker or a comparable container engine like podman.

Running

Check out the Makefile for further details on the commands below.

Build the application image first:

make docker-build

Start Triton with the bundled repository:

make docker-run-triton

Run the application against a directory of images:

make docker-run APP_ARGS='--image-dir=/app/input --model-name="torch_cnn_model" --classifier-model-name="sklearn_classifier"'

The default input and output mounts are mounts/in and mounts/out. Override them with INPUT_DIR and OUTPUT_DIR if needed. The output is a simple JSON containing the classification and detection results, check out src/main.cpp for details.

If you want Triton to use the GPU, run the containers on a GPU-enabled Docker setup and pass --model-cuda-shared-mem=true in APP_ARGS. That flag enables CUDA shared memory in the Triton client; actual GPU access still depends on a host with NVIDIA drivers and the NVIDIA container runtime configured.

Run the unit tests with:

make docker-test

To run only a subset of tests:

GTEST_FILTER='*VisionMapper*' make docker-test

Model Description

The bundled Triton repository lives in mounts/models and currently contains two example ONNX models. They are dummy models and don't produce any meaningful results, other than validating the end-to-end pipeline.

Triton can have an additional model config like here mounts/models/config.pbtxt.

Architecture

The code is split into three layers:

src/main.cpp handles CLI parsing, walks the input image directory, invokes the inference pipeline, and writes the final JSON output.
src/model_adapter_det.cpp and src/model_adapter_cls.cpp convert raw images into model-specific inputs, call Triton, and post-process outputs into application-level results.
src/model_client.cpp owns the Triton gRPC client and hides CPU vs CUDA shared-memory transport details.

The detection adapter runs the object detection model directly on resized images. The classifier adapter first extracts a small feature vector in src/feature_extraction.cpp and then sends it to Triton as a second model request.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
lib		lib
mounts/models		mounts/models
scripts		scripts
src		src
test		test
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision Inference on Edge

Setup

Running

Model Description

Architecture

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vision Inference on Edge

Setup

Running

Model Description

Architecture

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages