Trt with batch

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version: 8.2.4
GPU Type: RTX 2080 Ti
Nvidia Driver Version: 470.57.02
CUDA Version: 11.4
CUDNN Version: 8.4.0
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): Python 3.8.10
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.12.0+cu116
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:22.04-py3

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

First, I use original.plan which made by original.onnx then I got correct result. I want to run this model with batching so I re-made dynamic_batch_folded.onnx with below code.
dynamic_axes = {‘input_0’ : {0 : ‘batch_size’},
‘input_1’ : {1 : ‘batch_size’},
‘output_0’ : {0 : ‘batch_size’}}
torch.onnx.export(
scripted_module,
(x1, x2),
‘dynamic_batch_Generator_Adain_Upsample_torchscript.onnx’,
input_names=[‘x1’,‘x2’],
output_names=[‘outputs’],
export_params=True,
#example_outputs=scripted_module(x1, x2),
opset_version=11,
dynamic_axes = dynamic_axes)
and then
polygraphy surgeon sanitize dynamic_batch_Generator_Adain_Upsample_torchscript.onnx
–fold-constants
-o dynamic_batch_folded.onnx

And then I run simswap2trt.py to get dynamic_batch.plan

I run simswapRuntrt2.py to inference and get below result.

root@9dd4dce9103b:/workspace/simswap2trt/2trt# python simswapRuntrt2.py 
Read engine from dynamic_batch.plan
[07/22/2022-04:26:09] [TRT] [E] 3: [executionContext.cpp::setBindingDimensions::946] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::946, condition: profileMaxDims.d[i] >= dimensions.d[i]. Supplied binding dimension [16,3,244,244] for bindings[0] exceed min ~ max range at index 0, maximum dimension in profile is 1, minimum dimension in profile is 1, but supplied dimension is 16.
)
[07/22/2022-04:26:09] [TRT] [E] 3: [executionContext.cpp::setBindingDimensions::946] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::946, condition: profileMaxDims.d[i] >= dimensions.d[i]. Supplied binding dimension [16,512] for bindings[1] exceed min ~ max range at index 0, maximum dimension in profile is 1, minimum dimension in profile is 1, but supplied dimension is 16.
)
(3, 224, 224)
(1, 2408448)
-78002.5
[07/22/2022-04:26:09] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::35] Error Code 1: Cuda Runtime (invalid argument)
[07/22/2022-04:26:09] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::29] Error Code 1: Cuda Driver (invalid device context)
Segmentation fault (core dumped)

I think I make mistake in simswapRuntrt2.py(ex. inputs or reshape…), because trtexec result is right with batch(below)

trtexec :

root@9dd4dce9103b:/workspace# trtexec --loadEngine=/workspace/simswap2trt/2trt/dynamic_batch.plan --batch=16
&&&& RUNNING TensorRT.trtexec [TensorRT v8204] # trtexec --loadEngine=/workspace/simswap2trt/2trt/dynamic_batch.plan --batch=16
[07/22/2022-04:32:29] [I] === Model Options ===
[07/22/2022-04:32:29] [I] Format: *
[07/22/2022-04:32:29] [I] Model: 
[07/22/2022-04:32:29] [I] Output:
[07/22/2022-04:32:29] [I] === Build Options ===
[07/22/2022-04:32:29] [I] Max batch: 16
[07/22/2022-04:32:29] [I] Workspace: 16 MiB
[07/22/2022-04:32:29] [I] minTiming: 1
[07/22/2022-04:32:29] [I] avgTiming: 8
[07/22/2022-04:32:29] [I] Precision: FP32
[07/22/2022-04:32:29] [I] Calibration: 
[07/22/2022-04:32:29] [I] Refit: Disabled
[07/22/2022-04:32:29] [I] Sparsity: Disabled
[07/22/2022-04:32:29] [I] Safe mode: Disabled
[07/22/2022-04:32:29] [I] DirectIO mode: Disabled
[07/22/2022-04:32:29] [I] Restricted mode: Disabled
[07/22/2022-04:32:29] [I] Save engine: 
[07/22/2022-04:32:29] [I] Load engine: /workspace/simswap2trt/2trt/dynamic_batch.plan
[07/22/2022-04:32:29] [I] Profiling verbosity: 0
[07/22/2022-04:32:29] [I] Tactic sources: Using default tactic sources
[07/22/2022-04:32:29] [I] timingCacheMode: local
[07/22/2022-04:32:29] [I] timingCacheFile: 
[07/22/2022-04:32:29] [I] Input(s)s format: fp32:CHW
[07/22/2022-04:32:29] [I] Output(s)s format: fp32:CHW
[07/22/2022-04:32:29] [I] Input build shapes: model
[07/22/2022-04:32:29] [I] Input calibration shapes: model
[07/22/2022-04:32:29] [I] === System Options ===
[07/22/2022-04:32:29] [I] Device: 0
[07/22/2022-04:32:29] [I] DLACore: 
[07/22/2022-04:32:29] [I] Plugins:
[07/22/2022-04:32:29] [I] === Inference Options ===
[07/22/2022-04:32:29] [I] Batch: 16
[07/22/2022-04:32:29] [I] Input inference shapes: model
[07/22/2022-04:32:29] [I] Iterations: 10
[07/22/2022-04:32:29] [I] Duration: 3s (+ 200ms warm up)
[07/22/2022-04:32:29] [I] Sleep time: 0ms
[07/22/2022-04:32:29] [I] Idle time: 0ms
[07/22/2022-04:32:29] [I] Streams: 1
[07/22/2022-04:32:29] [I] ExposeDMA: Disabled
[07/22/2022-04:32:29] [I] Data transfers: Enabled
[07/22/2022-04:32:29] [I] Spin-wait: Disabled
[07/22/2022-04:32:29] [I] Multithreading: Disabled
[07/22/2022-04:32:29] [I] CUDA Graph: Disabled
[07/22/2022-04:32:29] [I] Separate profiling: Disabled
[07/22/2022-04:32:29] [I] Time Deserialize: Disabled
[07/22/2022-04:32:29] [I] Time Refit: Disabled
[07/22/2022-04:32:29] [I] Skip inference: Disabled
[07/22/2022-04:32:29] [I] Inputs:
[07/22/2022-04:32:29] [I] === Reporting Options ===
[07/22/2022-04:32:29] [I] Verbose: Disabled
[07/22/2022-04:32:29] [I] Averages: 10 inferences
[07/22/2022-04:32:29] [I] Percentile: 99
[07/22/2022-04:32:29] [I] Dump refittable layers:Disabled
[07/22/2022-04:32:29] [I] Dump output: Disabled
[07/22/2022-04:32:29] [I] Profile: Disabled
[07/22/2022-04:32:29] [I] Export timing to JSON file: 
[07/22/2022-04:32:29] [I] Export output to JSON file: 
[07/22/2022-04:32:29] [I] Export profile to JSON file: 
[07/22/2022-04:32:29] [I] 
[07/22/2022-04:32:29] [I] === Device Information ===
[07/22/2022-04:32:29] [I] Selected Device: NVIDIA GeForce RTX 2080 Ti
[07/22/2022-04:32:29] [I] Compute Capability: 7.5
[07/22/2022-04:32:29] [I] SMs: 68
[07/22/2022-04:32:29] [I] Compute Clock Rate: 1.65 GHz
[07/22/2022-04:32:29] [I] Device Global Memory: 11011 MiB
[07/22/2022-04:32:29] [I] Shared Memory per SM: 64 KiB
[07/22/2022-04:32:29] [I] Memory Bus Width: 352 bits (ECC disabled)
[07/22/2022-04:32:29] [I] Memory Clock Rate: 7 GHz
[07/22/2022-04:32:29] [I] 
[07/22/2022-04:32:29] [I] TensorRT version: 8.2.4
[07/22/2022-04:32:29] [I] [TRT] [MemUsageChange] Init CUDA: CPU +321, GPU +0, now: CPU 458, GPU 2810 (MiB)
[07/22/2022-04:32:29] [I] [TRT] Loaded engine size: 125 MiB
[07/22/2022-04:32:30] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +513, GPU +222, now: CPU 993, GPU 3142 (MiB)
[07/22/2022-04:32:30] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +115, GPU +54, now: CPU 1108, GPU 3196 (MiB)
[07/22/2022-04:32:30] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +106, now: CPU 0, GPU 106 (MiB)
[07/22/2022-04:32:30] [I] Engine loaded in 0.77102 sec.
[07/22/2022-04:32:30] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 983, GPU 3188 (MiB)
[07/22/2022-04:32:30] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 983, GPU 3196 (MiB)
[07/22/2022-04:32:30] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +108, now: CPU 0, GPU 214 (MiB)
[07/22/2022-04:32:30] [I] Using random values for input x1
[07/22/2022-04:32:30] [I] Created input binding for x1 with dimensions 1x3x224x224
[07/22/2022-04:32:30] [I] Using random values for input x2
[07/22/2022-04:32:30] [I] Created input binding for x2 with dimensions 1x512
[07/22/2022-04:32:30] [I] Using random values for output outputs
[07/22/2022-04:32:30] [I] Created output binding for outputs with dimensions 1x3x224x224
[07/22/2022-04:32:30] [I] Starting inference
[07/22/2022-04:32:33] [I] Warmup completed 720 queries over 200 ms
[07/22/2022-04:32:33] [I] Timing trace has 13104 queries over 3.01215 s
[07/22/2022-04:32:33] [I] 
[07/22/2022-04:32:33] [I] === Trace details ===
[07/22/2022-04:32:33] [I] Trace averages of 10 runs:
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.98018 ms - Host latency: 4.11215 ms (end to end 7.84487 ms, enqueue 1.33547 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.92994 ms - Host latency: 4.0637 ms (end to end 7.75614 ms, enqueue 1.36016 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 4.02873 ms - Host latency: 4.15739 ms (end to end 7.86645 ms, enqueue 1.26375 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.93725 ms - Host latency: 4.0666 ms (end to end 7.80777 ms, enqueue 1.15752 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 5.20785 ms - Host latency: 5.33594 ms (end to end 8.88118 ms, enqueue 1.03383 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.59381 ms - Host latency: 3.74604 ms (end to end 7.07 ms, enqueue 1.21404 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.55174 ms - Host latency: 3.68585 ms (end to end 6.95418 ms, enqueue 1.29547 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.50179 ms - Host latency: 3.63696 ms (end to end 6.89099 ms, enqueue 1.3069 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.5007 ms - Host latency: 3.65161 ms (end to end 6.90971 ms, enqueue 1.24853 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.46813 ms - Host latency: 3.59454 ms (end to end 6.5165 ms, enqueue 0.980518 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.53429 ms - Host latency: 3.66541 ms (end to end 6.9504 ms, enqueue 1.07989 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.52716 ms - Host latency: 3.66108 ms (end to end 6.9616 ms, enqueue 1.33918 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.54235 ms - Host latency: 3.67697 ms (end to end 6.94681 ms, enqueue 1.28337 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.51631 ms - Host latency: 3.65081 ms (end to end 6.91194 ms, enqueue 1.37484 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.53799 ms - Host latency: 3.67263 ms (end to end 6.95026 ms, enqueue 1.28378 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.52659 ms - Host latency: 3.67059 ms (end to end 6.69996 ms, enqueue 0.744891 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.54593 ms - Host latency: 3.69485 ms (end to end 6.95508 ms, enqueue 1.15656 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.52643 ms - Host latency: 3.66205 ms (end to end 6.93148 ms, enqueue 1.33008 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.53888 ms - Host latency: 3.67315 ms (end to end 6.95117 ms, enqueue 1.3158 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.60441 ms - Host latency: 3.78463 ms (end to end 7.11707 ms, enqueue 1.34712 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.54152 ms - Host latency: 3.67959 ms (end to end 6.95185 ms, enqueue 1.38457 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.55831 ms - Host latency: 3.69778 ms (end to end 6.9829 ms, enqueue 1.36529 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.55759 ms - Host latency: 3.69093 ms (end to end 6.99427 ms, enqueue 1.34864 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.55336 ms - Host latency: 3.68925 ms (end to end 6.98489 ms, enqueue 1.38036 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.56681 ms - Host latency: 3.69576 ms (end to end 6.96532 ms, enqueue 1.2881 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.54551 ms - Host latency: 3.65859 ms (end to end 7.03368 ms, enqueue 1.11639 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.52333 ms - Host latency: 3.64954 ms (end to end 6.95886 ms, enqueue 1.28856 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.56578 ms - Host latency: 3.69622 ms (end to end 6.99791 ms, enqueue 1.30562 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.57284 ms - Host latency: 3.70393 ms (end to end 7.00343 ms, enqueue 1.30751 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.61581 ms - Host latency: 3.74601 ms (end to end 7.09525 ms, enqueue 1.21525 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.62212 ms - Host latency: 3.75212 ms (end to end 7.11155 ms, enqueue 1.27651 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.64308 ms - Host latency: 3.76849 ms (end to end 7.15386 ms, enqueue 1.2597 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.52709 ms - Host latency: 3.64176 ms (end to end 6.70874 ms, enqueue 0.823645 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.54598 ms - Host latency: 3.6609 ms (end to end 7.02354 ms, enqueue 1.1005 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.57057 ms - Host latency: 3.69309 ms (end to end 7.00786 ms, enqueue 1.18674 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.54005 ms - Host latency: 3.66984 ms (end to end 6.95199 ms, enqueue 1.29946 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.65526 ms - Host latency: 3.7854 ms (end to end 7.20488 ms, enqueue 1.29462 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.56161 ms - Host latency: 3.69133 ms (end to end 6.9955 ms, enqueue 1.25409 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.5587 ms - Host latency: 3.69036 ms (end to end 6.99054 ms, enqueue 1.31422 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.53956 ms - Host latency: 3.67096 ms (end to end 6.9442 ms, enqueue 1.38062 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.59828 ms - Host latency: 3.72809 ms (end to end 7.03949 ms, enqueue 1.26989 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.55997 ms - Host latency: 3.68719 ms (end to end 7.00551 ms, enqueue 1.15958 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.55907 ms - Host latency: 3.68882 ms (end to end 6.9916 ms, enqueue 1.27778 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.54362 ms - Host latency: 3.69208 ms (end to end 6.97754 ms, enqueue 1.27937 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.58496 ms - Host latency: 3.71472 ms (end to end 7.04532 ms, enqueue 1.33318 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.5835 ms - Host latency: 3.71261 ms (end to end 7.03972 ms, enqueue 1.2418 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.56118 ms - Host latency: 3.69109 ms (end to end 6.98959 ms, enqueue 1.28058 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.56779 ms - Host latency: 3.69922 ms (end to end 6.99829 ms, enqueue 1.38412 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.70222 ms - Host latency: 3.82828 ms (end to end 7.25662 ms, enqueue 1.03427 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.84612 ms - Host latency: 3.9593 ms (end to end 7.64066 ms, enqueue 0.322131 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.82073 ms - Host latency: 3.94513 ms (end to end 7.62051 ms, enqueue 0.28158 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.8489 ms - Host latency: 3.96765 ms (end to end 7.63982 ms, enqueue 0.309351 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.81499 ms - Host latency: 3.92896 ms (end to end 7.60652 ms, enqueue 0.32395 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.82173 ms - Host latency: 3.93625 ms (end to end 7.57747 ms, enqueue 0.328394 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.56108 ms - Host latency: 3.71655 ms (end to end 6.69768 ms, enqueue 0.512744 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.554 ms - Host latency: 3.68389 ms (end to end 6.96941 ms, enqueue 1.29712 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.60535 ms - Host latency: 3.73342 ms (end to end 7.08279 ms, enqueue 1.1979 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.61387 ms - Host latency: 3.74246 ms (end to end 7.10476 ms, enqueue 1.17771 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.6084 ms - Host latency: 3.73628 ms (end to end 7.10168 ms, enqueue 1.17942 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.62502 ms - Host latency: 3.75913 ms (end to end 7.08284 ms, enqueue 1.34148 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.58999 ms - Host latency: 3.71421 ms (end to end 7.09155 ms, enqueue 0.845459 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.55767 ms - Host latency: 3.67178 ms (end to end 7.08818 ms, enqueue 0.89519 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.56826 ms - Host latency: 3.70051 ms (end to end 6.99885 ms, enqueue 1.46643 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.56104 ms - Host latency: 3.69641 ms (end to end 6.98821 ms, enqueue 1.40574 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.55623 ms - Host latency: 3.68879 ms (end to end 6.97158 ms, enqueue 1.44214 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.57222 ms - Host latency: 3.69966 ms (end to end 6.60569 ms, enqueue 1.28193 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.55229 ms - Host latency: 3.68137 ms (end to end 6.96648 ms, enqueue 1.33157 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.59141 ms - Host latency: 3.72417 ms (end to end 7.04683 ms, enqueue 1.37668 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.59805 ms - Host latency: 3.72883 ms (end to end 7.04824 ms, enqueue 1.33491 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.69229 ms - Host latency: 3.86697 ms (end to end 7.23455 ms, enqueue 1.30913 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.55923 ms - Host latency: 3.68899 ms (end to end 6.97507 ms, enqueue 1.40176 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.58091 ms - Host latency: 3.71218 ms (end to end 7.0488 ms, enqueue 1.43938 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.59133 ms - Host latency: 3.72087 ms (end to end 7.04116 ms, enqueue 1.36069 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.59199 ms - Host latency: 3.71992 ms (end to end 7.05903 ms, enqueue 1.22803 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.56267 ms - Host latency: 3.69009 ms (end to end 7.00349 ms, enqueue 1.17393 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.59656 ms - Host latency: 3.72773 ms (end to end 7.05969 ms, enqueue 1.28499 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.62725 ms - Host latency: 3.75862 ms (end to end 7.10691 ms, enqueue 1.30515 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.5636 ms - Host latency: 3.68938 ms (end to end 7.01475 ms, enqueue 1.24045 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.59043 ms - Host latency: 3.71777 ms (end to end 7.0762 ms, enqueue 1.19546 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.57095 ms - Host latency: 3.69998 ms (end to end 7.01826 ms, enqueue 1.25049 ms)
[07/22/2022-04:32:33] [I] Average on 10 runs - GPU latency: 3.59087 ms - Host latency: 3.72271 ms (end to end 7.04656 ms, enqueue 1.2001 ms)
[07/22/2022-04:32:33] [I] 
[07/22/2022-04:32:33] [I] === Performance summary ===
[07/22/2022-04:32:33] [I] Throughput: 4350.38 qps
[07/22/2022-04:32:33] [I] Latency: min = 3.34454 ms, max = 19.1245 ms, mean = 3.75742 ms, median = 3.66187 ms, percentile(99%) = 4.42883 ms
[07/22/2022-04:32:33] [I] End-to-End Host Latency: min = 3.59021 ms, max = 23.0414 ms, mean = 7.0977 ms, median = 7.03442 ms, percentile(99%) = 8.25531 ms
[07/22/2022-04:32:33] [I] Enqueue Time: min = 0.271484 ms, max = 1.49023 ms, mean = 1.1849 ms, median = 1.25244 ms, percentile(99%) = 1.4809 ms
[07/22/2022-04:32:33] [I] H2D Latency: min = 0.0513916 ms, max = 0.536621 ms, mean = 0.0754913 ms, median = 0.0737305 ms, percentile(99%) = 0.118652 ms
[07/22/2022-04:32:33] [I] GPU Compute Time: min = 3.21594 ms, max = 18.9751 ms, mean = 3.6259 ms, median = 3.53223 ms, percentile(99%) = 4.29877 ms
[07/22/2022-04:32:33] [I] D2H Latency: min = 0.0483398 ms, max = 0.0761719 ms, mean = 0.0560302 ms, median = 0.0556641 ms, percentile(99%) = 0.0603027 ms
[07/22/2022-04:32:33] [I] Total Host Walltime: 3.01215 s
[07/22/2022-04:32:33] [I] Total GPU Compute Time: 2.96961 s
[07/22/2022-04:32:33] [I] Explanations of the performance metrics are printed in the verbose logs.
[07/22/2022-04:32:33] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v8204] # trtexec --loadEngine=/workspace/simswap2trt/2trt/dynamic_batch.plan --batch=16

Hi,

Could you please share with us a minimal issue repro script and model for better debugging.
Also, without the Polygraphy tool, are you facing the same issue?

Thank you.

Thank you for reply.
I forgot uploading my code and files. I’ll send you my repo by message.
And, without the Polygraphy tool, I can’t make trt model because of ReflectionPad2d layer.

And with the polygraphy tool, model was run well when I didn’t try batching.

Thank you!

Hi,

Please share with us the dynamic_batch.onnx model as well.
We would like to reproduce the issue for better debugging.

Thank you.