Segmentation fault when updating from enqueueV2() to enqueueV3()

lioriz · January 15, 2024, 11:10am

Description

Following my post on deprecated functions in TensorRT 8.5.
I updated my code from enqueueV2 to enqueueV3.

From:

void GreenModel::LaunchInferenceAsyc() {
    cudaMemcpyAsync(buffer_bindings[BINDING_PTR_IDX_INPUT], input.data(), input.size() * sizeof(float), cudaMemcpyHostToDevice, stream);
    context->enqueueV2(buffer_bindings, stream, nullptr);
    cudaMemcpyAsync(output.data(), buffer_bindings[BINDING_PTR_IDX_OUTPUT], output.size() * sizeof(float), cudaMemcpyDeviceToHost, stream);
}

To:

void GreenModel::LaunchInferenceAsyc() {
    cudaMemcpyAsync(buffer_bindings[BINDING_PTR_IDX_INPUT], input.data(), input.size() * sizeof(float), cudaMemcpyHostToDevice, stream);
    context->enqueueV3(stream);
    cudaMemcpyAsync(output.data(), buffer_bindings[BINDING_PTR_IDX_OUTPUT], output.size() * sizeof(float), cudaMemcpyDeviceToHost, stream);
}

I’m getting a segmentation fault in:

bool enqueueV3(cudaStream_t stream) noexcept
{
    return mImpl->enqueueV3(stream);
}

It’s working fine with enqueueV2.
Am I missing an extra step here?

Environment

TensorRT Version: 8.5.2
Nvidia Driver Version: NVIDIA Jetson AGX Orin
CUDA Version: 11.4
Operating System + Version: linux ubuntu 20.04 aarch64

AakankshaS · January 16, 2024, 9:40am

Hi @lioriz ,
Can you please help us with the verbose logs here.

thanks

lioriz · January 17, 2024, 9:34am

@AakankshaS verbose logs:

INFO: Loaded engine size: 43 MiB
WARNING: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
VERBOSE: Trying to load shared library libcudnn.so.8
VERBOSE: Loaded shared library libcudnn.so.8
VERBOSE: Using cuDNN as plugin tactic source
INFO: [MemUsageChange] Init cuDNN: CPU +619, GPU +660, now: CPU 942, GPU 10444 (MiB)
INFO: [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +42, now: CPU 0, GPU 42 (MiB)
INFO: [MemUsageChange] Init cuDNN: CPU +0, GPU +3, now: CPU 901, GPU 10409 (MiB)
VERBOSE: Using cuDNN as core library tactic source
VERBOSE: Deserialization required 1644704 microseconds.
VERBOSE: Trying to load shared library libcudnn.so.8
VERBOSE: Loaded shared library libcudnn.so.8
VERBOSE: Using cuDNN as plugin tactic source
VERBOSE: Using cuDNN as core library tactic source
VERBOSE: Total per-runner device persistent memory is 0
VERBOSE: Total per-runner host persistent memory is 214528
VERBOSE: Allocated activation device memory of size 24331264
INFO: [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +23, now: CPU 0, GPU 65 (MiB)

lioriz · January 18, 2024, 9:41am

Solved by adding setTensorAddress( ) when initiating Cuda.

pseudocode:

void InitCuda(){
    cudaMalloc(&buffers_bindings[BINDING_PTR_IDX_INPUT], size );
    cudaMalloc(&buffers_bindings[BINDING_PTR_IDX_OUTPUT], size );

    this->context = this->model_->createExecutionContext();

    context->setTensorAddress(input_image_blob_name_.c_str(), buffer_bindings[BINDING_PTR_IDX_INPUT]);
    context->setTensorAddress(output_blob_name_.c_str(), buffer_bindings[BINDING_PTR_IDX_OUTPUT]);
}

void LaunchInferenceAsyc() {
    cudaMemcpyAsync(buffer_bindings[BINDING_PTR_IDX_INPUT], input.data(), input.size(), cudaMemcpyHostToDevice, stream);

    context->enqueueV3(stream);

    cudaMemcpyAsync(output.data(), buffer_bindings[BINDING_PTR_IDX_OUTPUT], output.size(), cudaMemcpyDeviceToHost, stream);
}

Thanks to this issue and this code example.

In addition, this issue: enqueueV3 is slower than enqueueV2 · Issue #2877 · NVIDIA/TensorRT · GitHub, was very interesting and helped my understanding.

system · February 1, 2024, 9:41am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
TensorRT 8 segmentation fault when creating two contexts concurrently TensorRT	9	2760	March 5, 2024
After getting result, occured segementation falut TensorRT segmentation	1	258	May 27, 2024
Segmentation fault when running build_serialized_network or deserialize_cuda_engine for both trt and onnx TensorRT	2	387	February 29, 2024
Segmentation fault in cuInit() CUDA on Windows Subsystem for Linux	8	3199	August 4, 2021
Multiple calls of enqueueV2 TensorRT	15	2171	September 19, 2021
Segmentation fault at training network Jetson TX2 ai-training	6	2609	September 5, 2021
enqueue error TensorRT	8	2307	December 3, 2019
Segmentation fault in enqueue() when using multithreading TensorRT tensorrt	3	1109	July 3, 2020
Segmentation fault when building an ICudaEngine in TensorRT3 Jetson TX2	10	3470	October 18, 2021
Jetson TX2 Tensorrt l4t-tensorflow NGC Segmentation fault at build trt graphconverterV2 Jetson TX2 tensorrt	4	485	May 17, 2023

Segmentation fault when updating from enqueueV2() to enqueueV3()

Description

Environment

Related topics