Description
My goal is to create an aplication for gesture recognition based on output from hand landmarks from mediapipe library.
For that, I created a custom model and trained it with my own data:
`def build_model(input_shape, output_shape):
print("Input shape " + str(input_shape))
print("Output shape "+ str(output_shape))
model = Sequential([
Dense(1024, input_shape = (63,), activation = 'relu'),
Dropout(0.05),
Dense(512 , activation = 'relu'),
Dropout(0.05),
Dense(256 , activation = 'relu'),
Dense(output_shape , activation = 'softmax'),
])
model.compile(Adam(learning_rate=.0001) , loss = 'sparse_categorical_crossentropy' , metrics = ['accuracy'])
#model.build(input_shape=(None, *input_shape))
model.summary()
return model
`
From mediapipe I get the vector of 63 elements which I pass to my custom model for inference.
model = load_model("./models/model_mine.keras", compile = True)
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(static_image_mode=False, max_num_hands=2, min_detection_confidence=0.5, min_tracking_confidence=0.5)
mp_draw = mp.solutions.drawing_utils
.
.
.
t1=time.time()
predictions = model.predict(vect,verbose = 0)
print(time.time() - t1)
The inferenc time on such a primitve model takes over 100ms. The thing that bothers me is that there is no visible CPU nor GPU consumption increase so I am not able to say 100% that TF uses GPU.
When I import Tensorflow, it looks like it has the GPU support:
root@tegra-ubuntu:/home# python3.10
Python 3.10.12 (main, Feb 4 2025, 14:57:36) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
/usr/local/lib/python3.10/dist-packages/matplotlib/projections/__init__.py:63: UserWarning: Unable to import Axes3D. This may be due to multiple versions of Matplotlib being installed (e.g. as a system package and as a pip package). As a result, the 3D projection is not available.
warnings.warn("Unable to import Axes3D. This may be due to multiple versions of "
>>> print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
2025-04-21 21:31:21.856474: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2025-04-21 21:31:21.907801: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2025-04-21 21:31:21.908188: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
Num GPUs Available: 1
Do you have any suggestion why the inference is so slow?
Environment
TensorRT Version: 10.3.0
GPU Type: Orin nano
Nvidia Driver Version:
CUDA Version: 12.6
CUDNN Version: 9.3.0.75
Operating System + Version: JP6.2
Python Version (if applicable): 3.10
TensorFlow Version (if applicable): Version: 2.17.0
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): Own container with build mediapipe with GPU