Custom Bounding Box Parsing function for RetinaNet in DeepStream without handling Anchors and Backbone

szymon.budziak.td · March 26, 2025, 1:02pm

• Hardware Platform (Jetson / GPU) : NVIDIA Jetson AGX Orin
• DeepStream Version : 7.1
• JetPack Version (valid for Jetson only) : 6.1
• TensorRT Version : 8.6.2.3
• Issue Type( questions, new requirements, bugs) : question
Hello,

I am working with a PyTorch RetinaNet model that has a ResNet-50 backbone. I have exported this model to ONNX, and below are its properties extracted using Netron.app and polygraphy:

Here is the output of polygraphy:

[I] ==== ONNX Model ====
    Name: main_graph | ONNX Opset: 17
    
    ---- 1 Graph Input(s) ----
    {input [dtype=float32, shape=('batch_size', 3, 1080, 1920)]}
    
    ---- 2 Graph Output(s) ----
    {cls_logits [dtype=float32, shape=('batch_size', 'Concatcls_logits_dim_1', 2)],
     bbox_regression [dtype=float32, shape=('batch_size', 'Concatbbox_regression_dim_1', 4)]}
    
    ---- 195 Initializer(s) ----
    
    ---- 626 Node(s) ----

I would like to use this model in DeepStream app. Since this is an object detection model, I implemented a custom bounding box parser function in C++ to handle the outputs from the network. Here’s my current implementation:

extern "C" bool NvDsInferParseCustomRetinaNet(
    const std::vector<NvDsInferLayerInfo> &outputLayersInfo,
    const NvDsInferNetworkInfo &networkInfo,
    const NvDsInferParseDetectionParams &detectionParams,
    std::vector<NvDsInferObjectDetectionInfo> &objectList)
{
    // Ensure output layers are valid
    if (outputLayersInfo.size() != 2)
    {
        std::cerr << "Error: Expected 2 output layers (class logits and bbox regression)." << std::endl;
        return false;
    }

    const NvDsInferLayerInfo *classLayer = nullptr;
    const NvDsInferLayerInfo *boxLayer = nullptr;

    for (const auto &layer : outputLayersInfo)
    {
        if (strcmp(layer.layerName, "cls_logits") == 0)
            classLayer = &layer;
        else if (strcmp(layer.layerName, "bbox_regression") == 0)
            boxLayer = &layer;
    }

    if (!classLayer || !boxLayer)
    {
        std::cerr << "Error: Missing class logits or bbox regression layers." << std::endl;
        return false;
    }

    // print dimenstions of classLayer and boxLayer
    // classLayer.inferDims.numElements has 778410 = 2 * 389205
    // classLayer->inferDims.d[0] = 389205
    // classLayer->inferDims.d[1] = 2
    // boxLayer.inferDims.numElements has 1556820 = 4 * 389205
    // boxLayer->inferDims.d[0] = 389205
    // boxLayer->inferDims.d[1] = 4

    // Extract buffers from classLayer and boxLayer
    float *classBuffer = (float *)classLayer->buffer;
    float *boxBuffer = (float *)boxLayer->buffer;
    const int numClasses = classLayer->inferDims.d[1];
    int numDetsToParse = classLayer->inferDims.numElements / numClasses;

    // Get parameters from config
    const float confidenceThreshold = detectionParams.perClassPreclusterThreshold[0];

    // Temporary vectors to store detecctions before NMS
    std::vector<NvDsInferObjectDetectionInfo> allDetections;
    std::vector<bool> keep;

    // Iterate through all detections and process them
    objectList.clear();
    for (int i = 0; i < numDetsToParse; i++)
    {
        // Get classification scores
        float scores[numClasses];
        scores[0] = sigmoid(classBuffer[i * 2]);
        scores[1] = sigmoid(classBuffer[i * 2 + 1]);

        // Find highest scoring class
        int maxClassId = (scores[1] > scores[0]) ? 1 : 0;
        float maxScore = std::max(scores[0], scores[1]);

        // Filter by confidence threshold
        if (maxScore < confidenceThreshold)
            continue;

        // Get bounding box coordinates
        float x1 = boxBuffer[i * 4];
        float y1 = boxBuffer[i * 4 + 1];
        float x2 = boxBuffer[i * 4 + 2];
        float y2 = boxBuffer[i * 4 + 3];

        // Clip boxes to image boundaries
        x1 = std::max(0.0f, std::min(x1, (float)networkInfo.width));
        y1 = std::max(0.0f, std::min(y1, (float)networkInfo.height));
        x2 = std::max(0.0f, std::min(x2, (float)networkInfo.width));
        y2 = std::max(0.0f, std::min(y2, (float)networkInfo.height));

        // Create detection object
        NvDsInferObjectDetectionInfo detection;
        detection.classId = maxClassId;
        detection.detectionConfidence = maxScore;
        detection.left = x1;
        detection.top = y1;
        detection.width = x2 - x1;
        detection.height = y2 - y1;

        objectList.push_back(detection);
    }
    return true;
}

Issue:

RetinaNet relies on anchors and feature pyramid networks (FPNs) to predict bounding boxes, which means that directly interpreting the cls_logits and bbox_regression outputs does not yield correct bounding boxes without additional processing.
In PyTorch, the model inside the forward function relies on the AnchorGenerator, RetinaNetHead, and GeneralizedRCNNTransform, among other components, to generate the final detections. Additionally it uses some postprocess detections and transformations. Reimplementing the entire post-processing pipeline (anchors + NMS) in C++ would require significant effort.

I came across NVIDIA’s RetinaNet example repository where surprisingly, this example does not include anchor generation or backbone processing in the nvdsparsebbox_retinanet.cpp file.

Questions:

How does NVIDIA’s example handle anchors and backbone if they are not explicitly present in the bounding box parsing code?
Is this functionality incorporated during ONNX to TensorRT conversion, embedding anchor generation directly into the TensorRT model?
How can I structure my custom bounding box parsing function so that I do not need to handle anchors and backbone manually, similar to NVIDIA’s RetinaNet example?

Looking forward to any insights on handling this efficiently within DeepStream!

junshengy · March 27, 2025, 7:30pm

This requires you to modify the RetinaNet model of pytorch to fuse the operator with the ONNX model, so that TRT can generate the engine file and output a layer that is compatible with DeepStream post-processing.

However, DeepStream 7.1 no longer supports retinanet, you can refer to some legacy code

github.com/NVIDIA-AI-IOT/deepstream_tao_apps

tao5.3_ds7.0ga/README.md

release

# Integrate TAO model with DeepStream SDK

- [Integrate TAO model with DeepStream SDK](#integrate-tao-model-with-deepstream-sdk)
  - [Description](#description)
  - [Prerequisites](#prerequisites)
  - [Download](#download)
    - [1. Download Source Code with SSH or HTTPS](#1-download-source-code-with-ssh-or-https)
    - [2. Download Models](#2-download-models)
  - [Triton Inference Server](#triton-inference-server)
  - [Build](#build)
    - [Build Sample Application](#build-sample-application)
  - [Run](#run)
  - [Information for Customization](#information-for-customization)
    - [TAO Models](#tao-models)
    - [Label Files](#label-files)
    - [DeepStream configuration file](#deepstream-configuration-file)
    - [Model Outputs](#model-outputs)
      - [1~4. Yolov3 / YoloV4 / Yolov4-tiny / Yolov5](#14-yolov3--yolov4--yolov4-tiny--yolov5)
      - [5~8. RetinaNet / DSSD / SSD/ FasterRCNN](#58-retinanet--dssd--ssd-fasterrcnn)
      - [9. PeopleSegNet](#9-peoplesegnet)

This file has been truncated. show original

szymon.budziak.td · March 31, 2025, 2:21pm

@junshengy Thank you for your response!

Just to clarify - by “fusing” the anchor postprocessing and NMS into the ONNX model as one of the final layers, does this mean that all RetinaNet postprocessing will be handled within the model itself?

Also, why is RetinaNet no longer supported in DeepStream 7.1? Does this mean I won’t be able to use it in my DeepStream pipeline at all?

Regarding the TAO Toolkit, if I use it, will it ensure that the postprocessing step is included in the final model? Additionally, how can I run TAO Toolkit on Jetson? As far as I know, TAO isn’t officially supported on Jetson devices.

Looking forward to your insights!

junshengy · April 1, 2025, 2:21am

Not exactly. In the demo I gave above, converting the output layer to bbox still requires deepstream, this post-processing is not part of the model.

You can use it with versions prior to DS-7.0, or you can port it to latest version by yourself, but it will not be officially supported.

Please refer to the TAO documentation above. You need to train on dGPU, export, and then deploy the model to Jetson.

szymon.budziak.td · April 1, 2025, 6:43am

@junshengy Thank you for your reply.

Do you mean it requires custom bounding box parser in C++, right? But instead of processing all the anchors, threshold filtering and NMS we “fuse” those functionalities to the ONNX model and then process in C++ custom bounding box parser only processed bounding boxes that are the ones we can display, is it correct?

I still do not understand this. Currently I have a RetinaNet model that is running fine on DeepStream 7.1, however it runs without a proper bounding box parser function because as you wrote the NMS should “fused” into the model. Why this model would not run on DeepStream 7.1? I do not have any troubles converting it to TensorRT and then appending nvinfer element to the pipeline.

Thank you, I will try training it on dGPU.

junshengy · April 1, 2025, 10:54am

Yes.

I mean the model provided by deepstream_tao_apps, not your model. If you want to use deepstream_tao_apps post-processing, you need to modify the output layer of your model to be the same as provided by tao

system · April 29, 2025, 2:53am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Converting Custom RetinaNet model to TensorRT in DeepStream DeepStream SDK tensorrt , neural-network-framework , jetson , deepstream , net	29	119	January 21, 2025
DeepStream implementation of working nwesem/mtcnn_facenet_cpp_tensorRT needed DeepStream SDK	8	855	October 12, 2021
Issue with Bounding Boxes and Object Detection in DeepStream Using YOLOv8 Model DeepStream SDK yolo , deepstream	9	93	April 18, 2025
Deepstream infrence gives no detection TAO Toolkit	28	1942	December 9, 2021
Integrating Tao Models (detectnet_v2) into Deepstream SDK TAO Toolkit tao , deepstream , jetson-nano	11	989	March 24, 2023
Run deepstream with Retiface model but got wrong output with different input shape image DeepStream SDK	19	1627	October 12, 2021
DeepStream, Tensorflow Model Zoo - Incompatibility DeepStream SDK	13	1506	October 12, 2021
Object detection pre-trained model inference issue in deepstream DeepStream SDK tensorrt , jetson-inference , gstreamer , python	51	427	August 9, 2024
LPRNet technical blog post: sample DeepStream app repo not found? TAO Toolkit jetson	20	148	September 3, 2024
TensorRT fails to parse bounding boxes and ONNX explicit batch DeepStream SDK tensorrt , deepstream	15	1406	April 20, 2023

Custom Bounding Box Parsing function for RetinaNet in DeepStream without handling Anchors and Backbone

Related topics