TensorRT engine gives garbage output

kevinstephen99 · February 8, 2020, 1:41pm

Hello!

I am currently working with a pre created ONNX model, The ONNX model was possibly created using an input shape of (10, 3, 32, 32). Does this mean my created engine for the said model will only work work on (10, 3, 32, 32).

If so, how do I generalise my shapes.

Another issue I am facing is that my engine on inference returns garbage values.
The values returned are -4.3160208e+08 or 0.

Here is the code that I am using.

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
runtime = trt.Runtime(TRT_LOGGER)

model_path = ‘model.onnx’

def print_network(network):
for i in range(network.num_layers):
layer = network.get_layer(i)

    print("\nLAYER {}".format(i))
    print("===========================================")
    layer_input = layer.get_input(0)
    if layer_input:
        print("\tInput Name:  {}".format(layer_input.name))
        print("\tInput Shape: {}".format(layer_input.shape))

    layer_output = layer.get_output(0)
    if layer_output:
        print("\tOutput Name:  {}".format(layer_output.name))
        print("\tOutput Shape: {}".format(layer_output.shape))
    print("===========================================")

def build_engine(model_path):
with trt.Builder(TRT_LOGGER) as builder, builder.create_network(flags = 1) as network,
trt.OnnxParser(network, TRT_LOGGER) as parser:
builder.max_workspace_size = 1<<32
builder.max_batch_size = 1
builder.fp16_mode = 0

with open(model_path, ‘rb’) as f:
value = parser.parse(f.read())
print("Parser: ", value)

    engine = builder.build_cuda_engine(network)
    print_network(network)

    print(engine)
    return engine

engine = build_engine(model_path)

buf = engine.serialize()

with open(“ssh.engine”, ‘wb’) as f:

f.write(buf)

create buffer

print(imu.dtype, “DTYPE”)

context = engine.create_execution_context()

print(engine.get_binding_shape(0),engine.get_binding_shape(1),engine.get_binding_shape(2), engine.get_binding_shape(3), engine.get_binding_shape(4), “binding shape”)
#print(imu[0].shape,“imu shape”)
h_input = cuda.pagelocked_empty(trt.volume((10, 3, 32, 32)), dtype = np.float32)
h_output_hmap = cuda.pagelocked_empty(trt.volume((1, 1, 8, 8)), dtype=np.float32)
h_output_scale = cuda.pagelocked_empty(trt.volume(( 1, 2, 8, 8)), dtype=np.float32)
h_output_offset = cuda.pagelocked_empty(trt.volume(( 1, 2, 8, 8)), dtype=np.float32)

Allocate device memory for inputs and outputs.

print(h_input.nbytes, “Hinput”)
print(h_output_hmap.nbytes, “Houtput_hmap”)
print(h_output_scale.nbytes, “Houtput_scale”)
print(h_output_offset.nbytes, “Houtput_offset”)

d_input = cuda.mem_alloc(h_input.nbytes)
d_output_hmap = cuda.mem_alloc(h_output_hmap.nbytes)
d_output_scale = cuda.mem_alloc(h_output_scale.nbytes)
d_output_offset = cuda.mem_alloc(h_output_offset.nbytes)
bindings = [int(d_input), int(d_output_hmap), int(d_output_scale), int(d_output_offset)]

Create a stream in which to copy inputs/outputs and run inference.

stream = cuda.Stream()

print(np.ascontiguousarray(np.array(imu[0])).shape)

with engine.create_execution_context() as context:
# Transfer input data to the GPU.
cuda.memcpy_htod_async(d_input,np.ascontiguousarray(np.array(imu)), stream)
# Run inference.
context.execute_async(bindings=bindings, stream_handle=stream.handle)
# Transfer predictions back from the GPU.
cuda.memcpy_dtoh_async(h_output_hmap, d_output_hmap, stream)
cuda.memcpy_dtoh_async(h_output_scale, d_output_scale, stream)
cuda.memcpy_dtoh_async(h_output_offset, d_output_offset, stream)
# Synchronize the stream
stream.synchronize()
# Return the host output.
print(h_output_hmap.shape,
h_output_scale.shape,
h_output_offset.shape)

    # print(h_output_hmap, "HMAP")

    print(h_output_hmap)
    # print(h_output_scale)
    # print(h_output_offset)

SunilJB · February 10, 2020, 4:10am

Hi,

Since ONNX model has static input shape of (10, 3, 32, 32), by default TensorRT engine will be optimized of that specific input size.

You can update your ONNX model for dynamic input and use TRT dynamic shape feature to support the dynamic input.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt-700/tensorrt-developer-guide/index.html#work_dynamic_shapes

Torch to ONNX (dynamic_axes):
https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html#

Regarding accuracy, could you please check the output of ONNX model and TRT engine using FP32?

Also, can you provide the following information along with model file so we can better help?
Provide details on the platforms you are using:
o Linux distro and version
o GPU type
o Nvidia driver version
o CUDA version
o CUDNN version
o Python version [if using python]
o Tensorflow and PyTorch version
o TensorRT version

Thanks

Topic		Replies	Views
ONNX model and TensorRT engine works differently TensorRT	5	746	February 20, 2023
TensorRT Segmentation output TensorRT tensorrt , cudnn , onnx	1	349	March 14, 2024
Simple ResNet model from PyTorch - "nan" Output TensorRT	1	1568	April 9, 2021
Tensorrt8.5 inference different with origin onnx model TensorRT	6	1099	December 13, 2022
ONNX Model and Tensorrt Engine gives different output TensorRT tensorrt , onnx	13	5421	June 29, 2022
Tensorrt fails shapeMachine.cpp TensorRT tensorrt , cudnn	2	414	February 16, 2024
Unable to batch TensorRT TensorRT	0	543	March 23, 2021
Troubleshooting Suggestions for ONNX v. TensorRT discrepancies TensorRT	7	1851	October 12, 2021
Batch Inference Wrong in Python API TensorRT	15	3556	October 12, 2021
TypeError: build_cuda_engine(): incompatible function arguments TensorRT	7	5733	October 12, 2021