C++ project for performing object detection and instance segmentation inference using the RF-DETR model with multiple inference backends (ONNX Runtime and TensorRT) and OpenCV.
- Dependencies
- Model Setup
- Installation
- Building
- Usage
- Configuration
- Technical Details
- Acknowledgements
- C++20 Compiler: Clang++ 15 or compatible (e.g.,
clang++-15) - CMake: Version 3.12 or higher
- OpenCV: Version 4.x (e.g., install via
sudo apt-get install libopencv-devon Ubuntu) - Google Test: Version 1.12.1 (automatically fetched during build)
- Ninja: Optional but recommended (
sudo apt-get install ninja-build)
- ONNX Runtime: Version 1.21.0 (automatically downloaded during build)
- Platform: Linux, Windows, macOS
- Acceleration: CPU and GPU (CUDA/DirectML)
- TensorRT: Version 10.x or 8.x+ (automatically downloaded during build if not found)
- CUDA Toolkit: Used version 13.x - must be installed manually
- Platform: Linux with NVIDIA GPU
- Acceleration: NVIDIA GPU only
- Note: TensorRT libraries are automatically configured with RPATH, no LD_LIBRARY_PATH needed
This project supports both RF-DETR detection and segmentation models from Roboflow.
-
Visit the RF-DETR Repository:
- Go to the RF-DETR GitHub repository for model details.
- Read the Roboflow blog for an overview.
-
Download the ONNX Model:
- Follow instructions in the export documentation to export models in ONNX format.
- Tested with:
rfdetr[onnxexport]==1.3.0(Python ≤ 3.11 required) - Detection models: Export with standard configuration (outputs:
dets,labels) - Segmentation models: Export with segmentation configuration (outputs:
dets,labels,masks) - Place the model (e.g.,
inference_model.onnx) in a chosen directory.
-
Prepare the COCO Labels:
- Create a
coco-labels-91.txtfile with one label per line:person bicycle car motorbike aeroplane ...
- Create a
sudo apt-get update
sudo apt-get install -y clang-15 libopencv-dev ninja-build cmakeEnsure clang++-15 is available as your compiler.
This project uses compile-time backend selection. Choose your backend when building:
| Backend | Best For | Pros | Cons |
|---|---|---|---|
| ONNX Runtime | Development, CPU inference | Cross-platform, easy setup | Slower than TensorRT on GPU |
| TensorRT | Production on NVIDIA GPUs | Maximum performance | GPU-only, requires CUDA/TensorRT |
Important: Only ONE backend can be enabled at a time. The backend is compiled into the binary for optimal performance and smaller binary size.
cmake -S . -B build -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=/usr/bin/clang-15 \
-DCMAKE_CXX_COMPILER=/usr/bin/clang++-15
cmake --build build --parallelcmake -S . -B build -G Ninja \
-DUSE_ONNX_RUNTIME=OFF \
-DUSE_TENSORRT=ON \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=/usr/bin/clang-15 \
-DCMAKE_CXX_COMPILER=/usr/bin/clang++-15
cmake --build build --parallelWhat happens:
- TensorRT 10.13.3.9 is automatically downloaded if not found
- Libraries are configured with RPATH - no need to set
LD_LIBRARY_PATH - The executable will use TensorRT for inference
- Requires CUDA 12.x (or 11.x+) installed manually
- Pre-built
.engineor.trtfiles are loaded directly, skipping ONNX-to-TensorRT conversion
-DUSE_ONNX_RUNTIME=ON/OFF- Enable ONNX Runtime backend (default: ON)-DUSE_TENSORRT=ON/OFF- Enable TensorRT backend (default: OFF)-DCMAKE_BUILD_TYPE=Release/Debug- Build configuration
- The RF-DETR model file (
.onnxfor ONNX Runtime,.onnx/.engine/.trtfor TensorRT) - An input image (e.g.,
image.jpg) - A COCO labels file (e.g.,
coco-labels-91.txt)
After building the project, run the inference application:
./build/inference_app /path/to/model.onnx /path/to/image.jpg /path/to/coco-labels-91.txt./build/inference_app /path/to/model.onnx /path/to/image.jpg /path/to/coco-labels-91.txt --segmentationIf you have a pre-built TensorRT engine file (.engine or .trt), use it directly:
./build/inference_app /path/to/model.engine /path/to/image.jpg /path/to/coco-labels-91.txt --segmentationFeatures:
- The output image is saved as
output_image.jpg - Detection/segmentation results (bounding boxes, labels, scores, and mask pixels) are printed to the console
- Input resolution is automatically detected from the model (supports 432x432, 560x560, etc.)
- Segmentation mode draws colored masks with transparency overlays
- Uses top-k selection (default: 300 detections) for efficient processing
The inference engine supports various configuration options that can be modified in src/main.cpp:
- Model Type:
ModelType::DETECTIONorModelType::SEGMENTATION - Resolution: Set to
0for auto-detection from model, or specify manually (e.g.,432,560) - Confidence Threshold: Default
0.5(adjustable inConfig::threshold) - Max Detections: Default
300for top-k selection (adjustable inConfig::max_detections) - Mask Threshold: Default
0.0for binary mask generation (adjustable inConfig::mask_threshold) - Normalization: ImageNet mean
[0.485, 0.456, 0.406]and std[0.229, 0.224, 0.225]
Config config;
config.resolution = 0; // Auto-detect
config.threshold = 0.6f; // Higher confidence threshold
config.max_detections = 100; // Fewer detections
config.mask_threshold = 0.5f; // More conservative masks
config.model_type = ModelType::SEGMENTATION;- dets:
float32[batch, num_queries, 4]- Bounding boxes incxcywhformat (normalized) - labels:
float32[batch, num_queries, num_classes]- Class logits
- dets:
float32[batch, num_queries, 4]- Bounding boxes incxcywhformat (normalized) - labels:
float32[batch, num_queries, num_classes]- Class logits - masks:
float32[batch, num_queries, mask_h, mask_w]- Segmentation masks (e.g., 108x108)
-
Preprocessing:
- Resize image to model input resolution (auto-detected)
- Convert BGR to RGB
- Normalize with ImageNet statistics
- Convert to CHW format
-
Inference:
- Run ONNX Runtime session
- Auto-detect output tensor names from model
-
Postprocessing:
- Detection: Select predictions above confidence threshold
- Segmentation:
- Apply sigmoid to class logits
- Top-k selection across all classes and queries
- Resize masks to original image dimensions using bilinear interpolation
- Apply threshold to create binary masks
- Convert bounding boxes from
cxcywhtoxyxyformat - Scale coordinates to original image size
-
Visualization:
- Draw bounding boxes with class labels
- Overlay segmentation masks with transparency (alpha = 0.5)
- Use deterministic colors based on class IDs
- The RF-DETR model used in this project is sourced from Roboflow, special thanks to the Roboflow team — check out their GitHub repository and site.
- Postprocessing implementation is based on Roboflow's reference implementations:
- Detection postprocessing: benchmark_rfdetr.py
- Instance segmentation postprocessing: benchmark_rfdetr_seg.py