Test .engine generated with depstream avec tensort

amine.ghamgui.anavid · January 20, 2025, 3:37pm

Hi,
I have this model (for image classification) generated by DeepStream (mymodel.onnx_b1_gpu0_fp32.engine), and I want to test it with images outside of DeepStream using TensorRT. How can I test it with TensorRT?

(Tensorrt Version: 8.6.1)

amine.ghamgui.anavid · January 20, 2025, 4:18pm

import pycuda.driver as cuda
import pycuda.autoinit
import tensorrt as trt
import numpy as np
from PIL import Image
import requests
from io import BytesIO

Load the TensorRT engine

logger = trt.Logger(trt.Logger.WARNING)
with open(“torch_yolov11.onnx_b1_gpu0_fp32.engine”, “rb”) as f:
engine_data = f.read()

Deserialize the engine

runtime = trt.Runtime(logger)
engine = runtime.deserialize_cuda_engine(engine_data)

Create a context

context = engine.create_execution_context()

Load image

image_url = “https://ultralytics.com/images/bus.jpg”
response = requests.get(image_url)
image = Image.open(BytesIO(response.content)).convert(“RGB”)

Preprocess the image (assuming the model expects 224x224 RGB images)

image = image.resize((224, 224))
image = np.array(image).astype(np.float32)
image = image.transpose((2, 0, 1)) # Change to CHW format
image = np.expand_dims(image, axis=0) # Add batch dimension

Allocate memory for input and output buffers

input_data = cuda.mem_alloc(image.nbytes)
#output_data = cuda.mem_alloc(engine.max_batch_size * np.prod(engine.get_binding_shape(1)) * np.float32().itemsize)
output_data = cuda.mem_alloc(int(engine.max_batch_size * np.prod(engine.get_binding_shape(1)) * np.float32().itemsize))

Ensure the input image is contiguous in memory

image = np.ascontiguousarray(image)

Transfer input data to GPU

cuda.memcpy_htod(input_data, image)

Run inference

context.execute_v2([int(input_data), int(output_data)])

Retrieve the output from GPU

output = np.empty(shape=engine.get_binding_shape(1), dtype=np.float32)
cuda.memcpy_dtoh(output, output_data)

print(“Inference result:”, output)

I used this code, but when I print the output, it displays:
[01/20/2025-16:53:44] [TRT] [E] 2: [softMaxV2Runner.cpp::execute::213] Error Code 2: Internal Error (Assertion y != nullptr failed. )
Inference result: [[-0.74447846 0.75425875]]
/tmp/ipykernel_46848/2110792327.py:5: DeprecationWarning: Use get_tensor_shape instead.
output = np.empty(shape=engine.get_binding_shape(1), dtype=np.float32)

Why does the result not display as probabilities (SoftMax output)?

amine.ghamgui.anavid · January 21, 2025, 10:24am

Hi, I get this error when I try to process more than one image. The first execution runs correctly, but when I run the next execution, this error is displayed. How can I resolve this problem?

------------------------------------------------error--------------------------------------------------------------------------------
LogicError: cuMemcpyDtoH failed: an illegal memory access was encountered

---------------------------------------------code---------------------------------------------------------------------------------

    input_data = cuda.mem_alloc(image.nbytes)
    output_shape = engine.get_binding_shape(1)
    output_size = int(np.prod(output_shape) * np.float32().itemsize)
    output_data = cuda.mem_alloc(output_size)

    # Ensure the input image is contiguous in memory
    image = np.ascontiguousarray(image)

    # Transfer input data to GPU
    cuda.memcpy_htod(input_data, image)

    # Run inference
    context.execute_v2([int(input_data), int(output_data)])

    # Allocate output buffer
    output = np.empty(output_shape, dtype=np.float32)

    # Retrieve output from GPU
    cuda.memcpy_dtoh(output, output_data)
    cuda.Context.synchronize()
    # Free memory after inference
    del input_data  # This deletes the reference to input_data, not the memory itself
    del output_data  # Same for output_data

amine.ghamgui.anavid · January 21, 2025, 1:23pm

any response !?

sophwats · January 21, 2025, 2:49pm

Hi @amine.ghamgui.anavid thanks for you’re patience - I’ve routed it to the right team and will get back to you with a response. Now moving your post to the DeepStream forum so you can get help from the right team!

Fiona.Chen · January 22, 2025, 1:15am

The engine file generated by DeepStream nvinfer is TensorRT engine file. There are samples for how to do inferencing with python TensorRT. TensorRT/samples/python at release/10.7 · NVIDIA/TensorRT

Topic		Replies	Views
Run engine trt file on image/video Jetson TX2 tensorrt	8	1542	October 18, 2021
TensorRT deployment with engine generated from TLT example TensorRT tensorrt	8	783	December 5, 2020
Error importing model engine in deepstream TensorRT	5	974	December 12, 2022
Inference error with engine created in deepstream TensorRT	3	1026	September 14, 2020
Inference with TensorRT after training Yolo v4 with TLT 3.0 TAO Toolkit	6	2042	October 12, 2021
Two inputs in TensorRT engine using python TensorRT tensorrt , jetson-inference , python	2	1068	November 4, 2023
How to use TensorRT engine obtained using tlt-convertor TAO Toolkit	4	670	October 12, 2021
Inference from the TensorRT engine DeepStream SDK	4	544	October 12, 2021
Run inference using TensorRT using .engine file of .trt file TensorRT	0	317	April 2, 2020
Run inference using TensorRT using .engine file of .trt file Frameworks (archived) tensorflow	1	597	December 4, 2020