Run RF-DETR on NVIDIA DeepStream
This project provides the necessary parsing libraries and configuration files that enable running RF-DETR models on NVIDIA DeepStream pipelines.
At the time being, the following features are supported:
- RF-DETR Nano, Small, Medium, Large
- FP32, FP16
The following features are a work in progress:
- INT8 calibration files
- RF-DETR for segmentation
- DLA validation
The project has been tested on the following DeepStream versions:
- DeepStream 8.0
- DeepStream 7.0
In a system with DeepStream installed:
makeThis will generate:
libdeepstream-rfdetr.so
This is the library that must be configured in the custom-lib-path property of
the NvInfer.
The official RF-DETR weights are hosted by Roboflow and exposed via
ephemeral URLs. The recommended method is to use their inference
package to do so. We provide a small utility to download weights:
-
Install uv if you havent.
-
Download the weights by running:
uv run ./download_weights.py MODEL_IDwhere MODEL_ID is one of:
- rfdetr-base (deprecated)
- rfdetr-nano
- rfdetr-small
- rfdetr-medium
- rfdetr-large
The script will download the weights to the current working directory
as MODEL_ID.onnx.
An example configuration file for NvInfer is provided in deepstream_rfdetr_bbox_config.txt.
The specific fields that make RF-DETR work are:
- net-scale-factor
- offsets
- custom-lib-path
- parse-bbox-func-name
- onnx-file / model-engine-file
- num-detected-classes
- model-color-format
- network-type
- maintain-aspect-ratio
- cluster-mode
- network-input-order
This config file works fine with DeepStream sample apps. A very simple pipeline that performs inference using RF-DETR over a file, and saves the result to a file is:
gst-launch-1.0 -e filesrc location=/opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4 \
! decodebin ! queue ! mux.sink_0 \
nvstreammux name=mux width=1920 height=1080 batch-size=1 ! \
nvinfer config-file-path=deepstream_rfdetr_bbox_config.txt ! \
queue ! nvdsosd ! nvv4l2h264enc ! h264parse ! queue ! mp4mux ! \
filesink location=OUTPUT.mp4Remember to adjust the nvstreammux width and height properties to match the
image size of your input video.
The config file uses RF-DETR Nano by default. To change it to a different model size, modify the following properties in the config:
- onnx-file=/path/to/<model-id>.onnx
- model-engine-file=/path/to/<model-id>.onnx_b1_gpu0_fp32.engine
where <model-id> is one of the ID's listed above. You'll need to adjust the b1 (batch size) nand fp32 (precision) portions of the engine according to the values set in batch-size and network-mode properties, respectively.
Every benchmark below was done using the following pipeline. You can get the perf from GitHub.
gst-launch-1.0 -e filesrc location=/opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4 ! \
decodebin ! queue ! mux.sink_0 \
nvstreammux name=mux width=1920 height=1080 batch-size=1 ! queue ! \
nvinfer config-file-path=deepstream_rfdetr_bbox_config.txt ! \
perf ! fakesink| Platform | DeepStream | Model | Batch | Precision | FPS |
|---|---|---|---|---|---|
| AGX Orin | 7.0 | rfdetr-nano | 1 | FP32 | 127 |
| AGX Orin | 7.0 | rfdetr-small | 1 | FP32 | 69 |
| AGX Orin | 7.0 | rfdetr-medium | 1 | FP32 | 52 |
| AGX Orin | 7.0 | rfdetr-large | 1 | FP32 | 16 |
| AGX Orin | 7.0 | rfdetr-base | 1 | FP32 | 47 |
| AGX Orin | 7.0 | rfdetr-nano | 1 | FP16 | 238 (*) |
| AGX Orin | 7.0 | rfdetr-small | 1 | FP16 | 151 (*)(**) |
| AGX Orin | 7.0 | rfdetr-medium | 1 | FP16 | 121 (*)(**) |
| AGX Orin | 7.0 | rfdetr-large | 1 | FP16 | 43 (*)(***) |
| AGX Orin | 7.0 | rfdetr-base | 1 | FP16 | 124 (*)(***) |
| DGX Spark | 8.0 | rfdetr-nano | 1 | FP32 | 266 |
| DGX Spark | 8.0 | rfdetr-small | 1 | FP32 | 135 |
| DGX Spark | 8.0 | rfdetr-medium | 1 | FP32 | 102 |
| DGX Spark | 8.0 | rfdetr-large | 1 | FP32 | 38 |
| DGX Spark | 8.0 | rfdetr-base | 1 | FP32 | 95 |
| DGX Spark | 8.0 | rfdetr-nano | 1 | FP16 | 488 |
| DGX Spark | 8.0 | rfdetr-small | 1 | FP16 | 270 |
| DGX Spark | 8.0 | rfdetr-medium | 1 | FP16 | 153 |
| DGX Spark | 8.0 | rfdetr-large | 1 | FP16 | 75 (***) |
| DGX Spark | 8.0 | rfdetr-base | 1 | FP16 | 195 (***) |
(*): Detection quality is degraded considerably, make sure to compare.
(**)TRT Warning: TensorRT encountered issues when converting weights between types and that could affect accuracy.
(***) TRT Warning: Running layernorm after self-attention in FP16 may cause overflow.
- Build the project as
make DEV=1. This will build in debug mode, as well as not allowing warnings. - Format the code by running
make format. - Lint the code by running
make lint. - Make sure all these three are clean before submitting a PR.