Description
Hello TRTians! :)
I have a custom YOLOv4 object detector (single class only) that was trained in Pytorch, then exported to ONNX followed by me trying to get it running on my laptop GPU by using the trt engine exported by the trtexec binary. My ultimate goal is to get this running on a Jetson Xavier, but I need to demonstrate that this works correctly on a laptop GPU first.
I am loading this trt serialized bin into my C++ inference code (and onnx too for network definitions), but the outputs (confidence scores and bounding boxes) produced by the C++/TensorRT inference is wayy off, to the point that none of the confidence scores are crossing even the 0.1 mark, effectively not detecting any object at all.
The corresponding ONNX model loaded using “onnxruntime” is producing correct inference results with the exact same image input (preprocessed input tensor dumped from the C++ inference code), without any errors or warnings. So the onnx file seems to be correct.
The only additional preprocessing that I am doing in the python onnxruntime code is expanding the dim: (3,416,416) → (1,3,416,416) . I dont know if a similar concept exists in the C++ counterpart, for a copying 1 batch input into a float pointer.
The C++ inference code is a modified version of the sampleOnnxMNIST example, minus the network optimizations and serialization part.
Here are the additional details and attachments for the same
Environment
TensorRT Version: 8.2.5.1:
GPU Type: GTX 1060:
Nvidia Driver Version: 470.129.06:
CUDA Version: 10.2:
CUDNN Version: 8:
Operating System + Version: Ubuntu 18.04:
Python Version: python 3.8:
PyTorch Version1.8:
Baremetal/no docker:
Relevant Files
I am attaching a linkhere: google drive to the following files:
- model onnx file
- The trtexec log with --verbose flag,
- a part of the C++ inference code utilizing the serialized trt engine binary.
- The onnxruntime script to validate the onnx file and the inference outputs.
- A sample image png file and the corresponding Numpy input image tensor dump after applying postprocessing.
Can the good people of Nvidia/Internet help me figure out what I am doing wrong? :)
Best regards