Device memory is insufficient to use tactic error when converting a model in SavedModel format to tensorrt model. Jetson Nano

rohansada000 · December 13, 2021, 9:50am

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version: 8.0.1
GPU Type: 128 core Maxwell GPU
Nvidia Driver Version:
CUDA Version: 10.1
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable): 2.5
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Model to be converted to TensorRT:

deco = Sequential([
    Conv2D(64,(3,3),activation='relu',padding='same',input_shape=(100,100,64),name='c2'),
    Conv2D(32,(3,3),activation='relu',padding='same'),
    Conv2D(16,(3,3),activation='relu',padding='same',name='c3'),
    Conv2D(1,(3,3),activation='relu',padding='same',name='c4'),
])

Code to convert to TensorRT:
import tensorflow as tf
gpu_devices = tf.config.experimental.list_physical_devices(‘GPU’)
tf.config.experimental.set_memory_growth(gpu_devices[0], True)
tf.config.experimental.set_virtual_device_configuration(
gpu_devices[0],
[tf.config.experimental.VirtualDeviceConfiguration(
memory_limit=1800)]) ## Crucial value, set lower than available GPU memory (note that Jetson shares GPU memory with CPU)
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import numpy as np
conversion_params = trt.DEFAULT_TRT_CONVERSION_PARAMS
conversion_params = conversion_params._replace(max_workspace_size_bytes=(1500000000))
conversion_params = conversion_params._replace(precision_mode=“FP16”)
encoder_model = trt.TrtGraphConverterV2(
input_saved_model_dir=‘/home/rohan/Desktop/original_models/decoder’,
conversion_params=conversion_params)
def input_fn():
# Substitute with your input size
Inp1 = np.random.normal(size=(1, 100, 100, 64)).astype(np.float32)
yield (Inp1, )
encoder_model.convert()
encoder_model.build(input_fn=input_fn)
encoder_model.save(output_saved_model_dir=‘/home/rohan/Desktop/converted_models/decoder’)

Steps To Reproduce

Output:
2021-08-26 20:05:37.528444: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
2021-08-26 20:05:46.600192: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-08-26 20:05:46.678156: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:05:46.678328: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.86GiB deviceMemoryBandwidth: 194.55MiB/s
2021-08-26 20:05:46.678428: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
2021-08-26 20:05:46.867047: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.10
2021-08-26 20:05:46.867313: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.10
2021-08-26 20:05:46.941017: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-08-26 20:05:47.048284: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-08-26 20:05:47.169099: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.10
2021-08-26 20:05:47.251010: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.10
2021-08-26 20:05:47.254379: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-08-26 20:05:47.254659: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:05:47.254903: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:05:47.255031: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
2021-08-26 20:05:47.973614: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libnvinfer.so.8
2021-08-26 20:05:48.810263: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:05:48.810443: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.86GiB deviceMemoryBandwidth: 194.55MiB/s
2021-08-26 20:05:48.810677: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:05:48.810877: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:05:48.810956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
2021-08-26 20:05:48.811082: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
2021-08-26 20:05:53.353701: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-26 20:05:53.353809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-08-26 20:05:53.353853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-08-26 20:05:53.354184: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:05:53.354487: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:05:53.354712: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:05:53.354853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1800 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2021-08-26 20:06:12.966546: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:13.072032: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2021-08-26 20:06:13.285829: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
2021-08-26 20:06:13.800398: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:13.927121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.86GiB deviceMemoryBandwidth: 194.55MiB/s
2021-08-26 20:06:13.927662: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:13.927935: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:13.971009: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
2021-08-26 20:06:14.084061: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-26 20:06:14.084206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-08-26 20:06:14.084282: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-08-26 20:06:14.185676: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:14.444806: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:14.450256: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1800 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2021-08-26 20:06:14.898454: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 19200000 Hz
2021-08-26 20:06:16.930787: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1171] Optimization results for grappler item: graph_to_optimize
function_optimizer: Graph size after: 42 nodes (31), 57 edges (46), time = 302.428ms.
function_optimizer: function_optimizer did nothing. time = 0.3ms.

2021-08-26 20:06:18.423304: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:18.430558: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2021-08-26 20:06:18.445344: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
2021-08-26 20:06:18.489187: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:18.489404: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.86GiB deviceMemoryBandwidth: 194.55MiB/s
2021-08-26 20:06:18.489628: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:18.489817: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:18.489900: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
2021-08-26 20:06:18.517767: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-26 20:06:18.517894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-08-26 20:06:18.517956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-08-26 20:06:18.518379: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:18.518895: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:18.519155: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1800 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2021-08-26 20:06:19.126605: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:790] There are 5 ops of 3 different types in the graph that are not converted to TensorRT: Identity, NoOp, Placeholder, (For more information see Accelerating Inference In TF-TRT User Guide :: NVIDIA Deep Learning Frameworks Documentation).
2021-08-26 20:06:19.154807: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:759] Number of TensorRT candidate segments: 1
2021-08-26 20:06:19.158407: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:853] Replaced segment 0 consisting of 27 nodes by TRTEngineOp_0_0.
2021-08-26 20:06:19.334990: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1171] Optimization results for grappler item: tf_graph
constant_folding: Graph size after: 26 nodes (-16), 41 edges (-16), time = 91.128ms.
layout: Graph size after: 30 nodes (4), 45 edges (4), time = 162.196ms.
constant_folding: Graph size after: 30 nodes (0), 45 edges (0), time = 28.984ms.
TensorRTOptimizer: Graph size after: 4 nodes (-26), 3 edges (-42), time = 93.251ms.
constant_folding: Graph size after: 4 nodes (0), 3 edges (0), time = 1.673ms.
Optimization results for grappler item: TRTEngineOp_0_0_native_segment
constant_folding: Graph size after: 29 nodes (0), 36 edges (0), time = 18.628ms.
layout: Graph size after: 29 nodes (0), 36 edges (0), time = 4.067ms.
constant_folding: Graph size after: 29 nodes (0), 36 edges (0), time = 3.565ms.
TensorRTOptimizer: Graph size after: 29 nodes (0), 36 edges (0), time = 0.388ms.
constant_folding: Graph size after: 29 nodes (0), 36 edges (0), time = 3.414ms.

2021-08-26 20:06:23.723168: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-08-26 20:06:24.605054: I tensorflow/compiler/tf2tensorrt/common/utils.cc:58] Linked TensorRT version: 8.0.1
2021-08-26 20:06:25.057608: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libnvinfer.so.8
2021-08-26 20:06:25.103550: I tensorflow/compiler/tf2tensorrt/common/utils.cc:60] Loaded TensorRT version: 8.0.1
2021-08-26 20:06:25.871391: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libnvinfer_plugin.so.8
2021-08-26 20:06:41.975207: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] TF-TRT Warning: DefaultLogger It is suggested to disable layer timing cache while using AlgorithmSelector. Please refer to the developer guide in Developer Guide :: NVIDIA Deep Learning TensorRT Documentation.
2021-08-26 20:07:57.835013: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] TF-TRT Warning: DefaultLogger Detected invalid timing cache, setup a local cache instead
2021-08-26 20:08:28.785340: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger Tactic Device request: 533MB Available: 169MB. Device memory is insufficient to use tactic.
2021-08-26 20:08:31.600899: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] TF-TRT Warning: DefaultLogger Skipping tactic 3 due to oom error on requested size of 533 detected for tactic 4.
2021-08-26 20:08:38.405814: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger Tactic Device request: 530MB Available: 192MB. Device memory is insufficient to use tactic.
2021-08-26 20:08:38.731835: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] TF-TRT Warning: DefaultLogger Skipping tactic 3 due to oom error on requested size of 530 detected for tactic 4.
2021-08-26 20:08:39.665473: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger Tactic Device request: 271MB Available: 201MB. Device memory is insufficient to use tactic.
2021-08-26 20:08:39.665709: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] TF-TRT Warning: DefaultLogger Skipping tactic 3 due to oom error on requested size of 271 detected for tactic 4.
2021-08-26 20:08:40.284923: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger Tactic Device request: 270MB Available: 204MB. Device memory is insufficient to use tactic.
2021-08-26 20:08:40.285156: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] TF-TRT Warning: DefaultLogger Skipping tactic 3 due to oom error on requested size of 270 detected for tactic 4.

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

nadeemm · December 14, 2021, 11:22pm

I shall move this question over to Jetson Nano, the team over there have more experience using Nano

AastaLLL · December 15, 2021, 3:18am

Hi,

This is a out of memory error.

Please note that Nano only has 4GiB memory and need to share with CPU and GPU.
So it’s limited to deploy a complicated model.

Thanks.

system · January 5, 2022, 5:32am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
TensorFlow GPU device created with only 1591MB memory (or is it 3.87GiB?), despite there being over 20GB available Jetson Nano tensorflow , tf-trt	2	2729	June 25, 2021
Error while converting my model to a TensorRT model. Not found: Container TF-TRT does not exist. (Could not find resource: TF-TRT/TRTEngineOp_0_0) TensorRT tensorrt	1	2577	December 9, 2021
Version mismatch help to rectify please :) Jetson Nano	7	917	October 14, 2021
Memory Issues and Conversion issues with TF-TRT on Nano Jetson Nano tensorrt	8	1536	October 18, 2021
Jetson TX2 Tensorrt l4t-tensorflow NGC Segmentation fault at build trt graphconverterV2 Jetson TX2 tensorrt	4	485	May 17, 2023
Tf-trt conversion got killed TensorRT tensorrt , tensorflow , jetson-inference	3	748	April 22, 2021
TensorRT optimization random outcome Jetson Nano	5	799	October 15, 2021
Calibration failed: INTERNAL: Failed to build TensorRT engine (INT8 precision mode) in Jetson Xavier NX (16GB) Jetson Xavier NX tensorrt	9	752	April 12, 2023
No improvements from TensorRT on NVIDIA-AI-IOT/tf_trt_models TensorRT	3	1565	February 21, 2019
TF-TRT no engine generated TensorRT tensorrt , tensorflow	4	937	October 18, 2022

Device memory is insufficient to use tactic error when converting a model in SavedModel format to tensorrt model. Jetson Nano

Description

Environment

Relevant Files

Steps To Reproduce

Related topics