How to optimize YOLOv5 inference performance on Jetson Nano (JetPack 4.6.1)

Hi,

I’m working on a project that involves deploying a custom-trained YOLOv5s model on the Jetson Nano 4GB Developer Kit for real-time object detection.

Here’s a breakdown of the current setup:

  • Device: Jetson Nano (4GB)

  • JetPack Version: 4.6.1

  • Camera: Logitech C270 (USB)

  • Model Format: Converted YOLOv5s to ONNX

  • Software: PyTorch 1.11, ONNX Runtime 1.10, Python 3.6

What I’ve Done

  • Trained and exported YOLOv5s model to ONNX.

  • Confirmed ONNX model runs correctly on desktop.

  • Installed ONNX Runtime and ran inference on Jetson Nano.

  • Attempted optimization with TensorRT but didn’t see expected improvements.

What I Need Help With

  • Is there a recommended pipeline or script to run YOLOv5 ONNX models with TensorRT on Jetson Nano?

  • What are the best practices for converting ONNX models to TensorRT (especially layer compatibility)?

  • How can I enable FP16 or INT8 inference on the Nano?

  • What input image resolution and batch size are optimal for achieving real-time performance?

  • Are there memory-saving techniques to deal with Jetson Nano’s limited 4GB RAM?

I’m trying to get at least ~15 FPS for basic object detection with one camera input. I’ve seen mentions of DeepStream and other TensorRT wrappers — if those are better suited for this, I’d appreciate pointers to sample projects or official documentation.

Additional Info:

I’m using a standard Jetson Nano dev kit sourced locally from this page, which includes GPIO headers and accessories. Mentioning in case the hardware variant affects optimization.

Thanks in advance for your support!

May I know how you converted your model to TRT model

Could you quickly check using --fp16 and --int8 flags with trtexec to get inference perf numbers and see if it meets your perf requirements.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.