Test .engine generated with depstream avec tensort

Hi,
I have this model (for image classification) generated by DeepStream (mymodel.onnx_b1_gpu0_fp32.engine), and I want to test it with images outside of DeepStream using TensorRT. How can I test it with TensorRT?

(Tensorrt Version: 8.6.1)

import pycuda.driver as cuda
import pycuda.autoinit
import tensorrt as trt
import numpy as np
from PIL import Image
import requests
from io import BytesIO

Load the TensorRT engine

logger = trt.Logger(trt.Logger.WARNING)
with open(“torch_yolov11.onnx_b1_gpu0_fp32.engine”, “rb”) as f:
engine_data = f.read()

Deserialize the engine

runtime = trt.Runtime(logger)
engine = runtime.deserialize_cuda_engine(engine_data)

Create a context

context = engine.create_execution_context()

Load image

image_url = “https://ultralytics.com/images/bus.jpg
response = requests.get(image_url)
image = Image.open(BytesIO(response.content)).convert(“RGB”)

Preprocess the image (assuming the model expects 224x224 RGB images)

image = image.resize((224, 224))
image = np.array(image).astype(np.float32)
image = image.transpose((2, 0, 1)) # Change to CHW format
image = np.expand_dims(image, axis=0) # Add batch dimension

Allocate memory for input and output buffers

input_data = cuda.mem_alloc(image.nbytes)
#output_data = cuda.mem_alloc(engine.max_batch_size * np.prod(engine.get_binding_shape(1)) * np.float32().itemsize)
output_data = cuda.mem_alloc(int(engine.max_batch_size * np.prod(engine.get_binding_shape(1)) * np.float32().itemsize))

Ensure the input image is contiguous in memory

image = np.ascontiguousarray(image)

Transfer input data to GPU

cuda.memcpy_htod(input_data, image)

Run inference

context.execute_v2([int(input_data), int(output_data)])

Retrieve the output from GPU

output = np.empty(shape=engine.get_binding_shape(1), dtype=np.float32)
cuda.memcpy_dtoh(output, output_data)

print(“Inference result:”, output)


I used this code, but when I print the output, it displays:
[01/20/2025-16:53:44] [TRT] [E] 2: [softMaxV2Runner.cpp::execute::213] Error Code 2: Internal Error (Assertion y != nullptr failed. )
Inference result: [[-0.74447846 0.75425875]]
/tmp/ipykernel_46848/2110792327.py:5: DeprecationWarning: Use get_tensor_shape instead.
output = np.empty(shape=engine.get_binding_shape(1), dtype=np.float32)

Why does the result not display as probabilities (SoftMax output)?

Hi, I get this error when I try to process more than one image. The first execution runs correctly, but when I run the next execution, this error is displayed. How can I resolve this problem?

------------------------------------------------error--------------------------------------------------------------------------------
LogicError: cuMemcpyDtoH failed: an illegal memory access was encountered

---------------------------------------------code---------------------------------------------------------------------------------

    input_data = cuda.mem_alloc(image.nbytes)
    output_shape = engine.get_binding_shape(1)
    output_size = int(np.prod(output_shape) * np.float32().itemsize)
    output_data = cuda.mem_alloc(output_size)

    # Ensure the input image is contiguous in memory
    image = np.ascontiguousarray(image)

    # Transfer input data to GPU
    cuda.memcpy_htod(input_data, image)

    # Run inference
    context.execute_v2([int(input_data), int(output_data)])

    # Allocate output buffer
    output = np.empty(output_shape, dtype=np.float32)

    # Retrieve output from GPU
    cuda.memcpy_dtoh(output, output_data)
    cuda.Context.synchronize()
    # Free memory after inference
    del input_data  # This deletes the reference to input_data, not the memory itself
    del output_data  # Same for output_data

any response !?

Hi @amine.ghamgui.anavid thanks for you’re patience - I’ve routed it to the right team and will get back to you with a response. Now moving your post to the DeepStream forum so you can get help from the right team!

The engine file generated by DeepStream nvinfer is TensorRT engine file. There are samples for how to do inferencing with python TensorRT. TensorRT/samples/python at release/10.7 · NVIDIA/TensorRT