Slow inference on custom model

bkawlatow · April 21, 2025, 9:35pm

Description

My goal is to create an aplication for gesture recognition based on output from hand landmarks from mediapipe library.
For that, I created a custom model and trained it with my own data:

`def build_model(input_shape, output_shape):
print("Input shape " + str(input_shape))
print("Output shape "+ str(output_shape))

model = Sequential([
Dense(1024, input_shape = (63,), activation = 'relu'),
Dropout(0.05),
Dense(512 , activation = 'relu'),
Dropout(0.05),
Dense(256 , activation = 'relu'),
Dense(output_shape , activation = 'softmax'),
])

model.compile(Adam(learning_rate=.0001) , loss = 'sparse_categorical_crossentropy' , metrics = ['accuracy'])
#model.build(input_shape=(None, *input_shape))
model.summary()
return model

`

From mediapipe I get the vector of 63 elements which I pass to my custom model for inference.

model = load_model("./models/model_mine.keras", compile = True)
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(static_image_mode=False, max_num_hands=2, min_detection_confidence=0.5, min_tracking_confidence=0.5)
mp_draw = mp.solutions.drawing_utils
.
.
.
t1=time.time()
predictions = model.predict(vect,verbose = 0)
print(time.time() - t1)

The inferenc time on such a primitve model takes over 100ms. The thing that bothers me is that there is no visible CPU nor GPU consumption increase so I am not able to say 100% that TF uses GPU.

When I import Tensorflow, it looks like it has the GPU support:

root@tegra-ubuntu:/home# python3.10
Python 3.10.12 (main, Feb  4 2025, 14:57:36) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
/usr/local/lib/python3.10/dist-packages/matplotlib/projections/__init__.py:63: UserWarning: Unable to import Axes3D. This may be due to multiple versions of Matplotlib being installed (e.g. as a system package and as a pip package). As a result, the 3D projection is not available.
  warnings.warn("Unable to import Axes3D. This may be due to multiple versions of "
>>> print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
2025-04-21 21:31:21.856474: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2025-04-21 21:31:21.907801: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2025-04-21 21:31:21.908188: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
Num GPUs Available:  1

Do you have any suggestion why the inference is so slow?

Environment

TensorRT Version: 10.3.0
GPU Type: Orin nano
Nvidia Driver Version:
CUDA Version: 12.6
CUDNN Version: 9.3.0.75
Operating System + Version: JP6.2
Python Version (if applicable): 3.10
TensorFlow Version (if applicable): Version: 2.17.0
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): Own container with build mediapipe with GPU

AakankshaS · April 30, 2025, 4:44pm

Hi @bkawlatow ,
I suspect this issueis linked with TRT.
Hence I would recommend you to pls reach out to Tensorflow github page.

Thanks

bkawlatow · April 30, 2025, 6:33pm

Thank you for suggestion. I converted the model to ONNX and use the onnx runtime. The inference time went down to ~1ms.

system · May 14, 2025, 6:33pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
TensorRt inference is taking 1.5 sec to inference a single frame.i want to speed up my inference.How can i do that TensorRT tensorrt , cuda , jetson-nano	3	764	March 13, 2023
Extremely slow inference in TensorRT for live semantic segmentation model Jetson AGX Xavier tensorrt , tensorflow , jetson-inference	11	4394	April 12, 2022
TensorRT Batching Speed scales poorly TensorRT tensorrt , cuda	6	1730	September 30, 2021
Inference time increases rapidly when set a high resolution input image TensorRT tensorrt , cuda , ubuntu	1	812	September 13, 2023
TensorRt inference is taking 1.5 sec to inference a single frame.i want to speed up my inference TensorRT tensorrt , jetson-inference , jetson-nano	1	915	March 13, 2023
Tensorflow running very slow on Nvidia Jetson AGX Orin Jetson AGX Orin tensorflow	3	56	March 4, 2025
How can I optimize multi-batch and parallel inference in TensorRT for faster performance on high-resolution image patches? TensorRT tensorrt , cuda , ubuntu , python , cudnn , deep-learning	2	93	December 2, 2024
TensorRT 3: Faster TensorFlow Inference and Volta Support Technical Blog	16	462	December 8, 2020
Cuda transfer from device to host is extremely slow TensorRT cuda	5	2596	February 13, 2022
Tensorrt can not speed up well TensorRT	7	1625	June 29, 2022

Slow inference on custom model

Description

Environment

Related topics