Issues when using tee with output-selector

yankee.kilo · June 26, 2025, 2:33pm

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU): Jetson Orin NX 16GB
• DeepStream Version: 7.1
• JetPack Version (valid for Jetson only): 6.1
• TensorRT Version: 10.7
• Issue Type( questions, new requirements, bugs) Questions, bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing): deepstream_parallel_infer_app

I have tried to modify the architecture deepstream parallel app pipeline with some custom modification as per my requirement in both python and C++. The problem is python app freezes without any error, and c++ app crashes with error

deepstream-pipeline(+0x7a88)[0xaaaadacf7a88]
/lib/aarch64-linux-gnu/libgstreamer-1.0.so.0(+0x925a8)[0xffff8c8e25a8]
/lib/aarch64-linux-gnu/libglib-2.0.so.0(g_hook_list_marshal+0x58)[0xffff8c6b9608]
/lib/aarch64-linux-gnu/libgstreamer-1.0.so.0(+0x929f4)[0xffff8c8e29f4]
/lib/aarch64-linux-gnu/libgstreamer-1.0.so.0(+0x9486c)[0xffff8c8e486c]
/lib/aarch64-linux-gnu/libgstreamer-1.0.so.0(+0x97cb8)[0xffff8c8e7cb8]
/lib/aarch64-linux-gnu/libgstreamer-1.0.so.0(gst_pad_push+0x124)[0xffff8c8e80e8]
/usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstcoreelements.so(+0x436fc)[0xffff7c6036fc]
Unable to set device in gst_nvstreammux_src_collect_buffers

Pipeline graph:

The pipeline work fines when I remove the tee that is just after the streammux, but as expected the buffer for inactive pad of nvdsmetamux are being dropped.

I have referred the following post to build my pipeline.

fanzh · June 27, 2025, 2:49am

let’s focus on the C version fisrt.

did the app crash at the beginning or after a while? how did you use output-selector?
from the error “Unable to set device”, there was an cuda error. could you share the complete log?
could you simplfy the code to narrow down this issue? for exmaple, adding only fakesink after nvmetamux.

yankee.kilo · June 27, 2025, 5:23am

I have probe function on each nvinfer src pad, I can see atleast 2 buffers from each nvinfer were pushed, after that the pipeline freezes, also I am using nvurisrcbin with reconnection-interval 60, I can only see reconnection logs after the pipeline freezes and rest all logs (including GST_DEBUG) freezes
I have tried setting GST_DEBUG=nvstreammux:9, but I don’t see any error. I will share the logs of GST_DEBUG logs, is there any other logs I should check?
Sure, I will share the logs of simplified pipeline, allow me few hours.

Also, our main system will be using python. So is it possible to focus on debugging python issues?

fanzh · June 27, 2025, 5:37am

The pipline you shared does not support parallel inference because only one nvinfer will do inference at the same time. please refer to the ready-made deepstream_parallel_inference_app sample deepstream_parallel_inference_app. you can dump the pipelne graph. then port the code to Python version.

yankee.kilo · June 27, 2025, 6:14am

I have referred the deepstream_parallel_inference_app, also I know that the pipeline I have built will not perform “parallel inference”, because I do not want parallel inference, I have a probe on sink pad of output-selector and I want to pass alternate buffers to each model. which I am able to do without Tee plugin (tee_streammux). The problem with that is the metamuxer has only one active pad (lets say sink_0), so the buffer from sink_1 and sink_2 are getting dropped because there is no buffer on sink_0 at that time (using output-selector), then i referred this post which suggested using “Tee” that will pass all the buffer to metamux to handle this buffer drops.

yankee.kilo · June 27, 2025, 12:04pm

I have tried to simplify the pipeline, but still getting similar error

debug.txt (25.6 KB)
debug_python.txt (45.1 KB)

In python I can observe few buffer were passed to metamux but as soon as buffer is pushed from metamux the pipeline freezes, but in C++, pipeline crashed without any buffer getting processed by nvinfer (the probe function on src pad was not executed)

fanzh · June 30, 2025, 2:23am

If you don’t want parallel inference, nvmetamux is not needed because it is used to merge meta from the inference results of different models. from the pipeline, it seems the config is not set for nvdsmetamux.
from the error “Unable to set device in gst_nvvideoconvert_transform”, did you set gpu-id to a value which is not zero? on Jetson, gpu-id needs to be zero.
if you crash issue persists, you can continue to simplify the pipeline to check which element caused the crash. for exmaple, can “…->nvstreammux->nvinfer->fakesink” run well? can “…->nvstreammux->…output-selector->…->nvinfer->fakesink” run well?

yankee.kilo · June 30, 2025, 5:40am

Is there any alternative plugin that can be used in place of nvdsmetamux? my requirement is simple, to push the buffer from these 3 NvInfer into one single branch, (i.e. to a tracker) I tried using funnel but I cannot see any bbox when rendering using nvdsosd and nv3dsink.

fanzh · June 30, 2025, 5:55am

what are three models used to do respectively?
If you don’t want parallel inference, you can use nvinfer in sequence, like “nvsreammux->nvnfer(1)->nvinfer(1)->nvifner(1)->…”, interval/secondary-reinfer-interval property of nvinfer can used to control the inference interval.

yankee.kilo · June 30, 2025, 6:12am

All three model are object detection model, they all operate in primary mode.
I tried the sequential processing, but it increases the latency. To handle the latency issue I implemented the parallel inference pipeline. but what we have observed is the “dequeOutputAndAttachMeta” has increased 3x from sequential inference pipeline, in sequential pipeline it used to take 80ms for each model (even this is too much) now in parallel inference pipeline its taking 180-200ms.

fanzh · June 30, 2025, 6:40am

please refer to this link for performance improvement.
If you don’t want parallel inference, please use nvinfer in sequence. please refer to the sample deepstream-test2. If you want want parallel inference, please refer to the ready-made sample deepstream_parallel_inference_app, which does the real parallel inference. you can read the graph in the readme first.

yankee.kilo · June 30, 2025, 7:01am

I have already referred that doc
I have already implemented pipeline that does “real parallel inference”, attaching the graph of the pipeline for you ref

the “dequeuOutputandAttachMeta” of nvinfer in this pipeline takes 3x more time than sequential nvinfer

fanzh · June 30, 2025, 7:13am

Thanks for the sharing! the graph is not clear. could you share a zip file? can this run well?
“dequeueOutputAndAttachMeta” includes inferecne, postprocessing and adding meta. did you add custom postprocessing? Since nvinfer plugin is opensource, you can add nsight measure code to check which part costs too much time.

yankee.kilo · June 30, 2025, 7:19am

parallel-inference.zip (535.4 KB)
The above pipeline works just fine, but the dequeOutputAndAttachMeta" is causing bottlenecks in the pipeline

we are using this repo, and it does has custom parse-bbox-func. But after digging through few blogs on internet, the postprocessing in nvinfer is sequential and not parallel for the frames in a batch? which makes sense because when the number of stream are increased, the time was this particular process increases (and it is not linear)

fanzh · June 30, 2025, 7:44am

Thanks for the sharing! From the pipeline you shared, the latency is because the device will run three models at the time. The pipeline is a special case of deepstream_parallel_inference_app, which supports choosing sources for the specified model.

yankee.kilo · June 30, 2025, 7:51am

the latency is because the device will run three models at the time

i can understand that during inference, but when attaching metadata this should not affect the latency, my GPU/CPU are not even getting fully utilized.

The pipeline is a special case of deepstream_parallel_inference_app, which supports choosing sources for the specified model.

How? I am passing the same batch to all the nvinfer

fanzh · June 30, 2025, 7:58am

I mean, deepstream_parallel_inference_app supports choosing source to do inference. Taking this cfg for example, there are 4 sources, only source 0,1,2 are chosen to do inferece by pige0. In your pipeline, all sources are chosen to do inferece by each nvinfer.

yankee.kilo · June 30, 2025, 8:58am

Anyway, any update with the original issue?

fanzh · June 30, 2025, 9:19am

from the pipeline, it seems the config is not set for nvdsmetamux.
to narrow down this issue, can “output-selector->queue->nvinfer->fakesink” run well? can “output-selector->queue->nvinfer->nvmetamux->fakesink” run well?

yingliu · July 31, 2025, 3:42am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Topic		Replies	Views
No increase using tee and parallel inference on AGX DeepStream SDK	12	470	January 25, 2024
Parallel Inference vs Tee DeepStream SDK	8	132	August 26, 2024
Problem with overlapping tee-related inference results in deepstream DeepStream SDK	3	33	August 9, 2024
There is a confusing bug by using multiple "nvinfer" in parallel by "tee" DeepStream SDK deepstream	9	99	November 4, 2024
Parallel Inference Demo using Python DeepStream SDK	4	845	March 30, 2023
Does DeepStream support running multiple classifiers in parallel after a detector? DeepStream SDK	9	930	May 10, 2022
Deepstream parallel inference failing to produce video output with 'nvmultistreamtiler' DeepStream SDK gstreamer , jetson , deepstream	46	1696	January 24, 2024
Not able to use Tee plugin - Frames are frozen in the sink window DeepStream SDK gstreamer	8	1177	November 30, 2021
DeepStream Parallel Pipeline and Frame Synchronization DeepStream SDK nvbugs , deepstream	3	128	March 5, 2025
Parallel inference pipeline, and linking DeepStream SDK	4	761	February 7, 2023

Issues when using tee with output-selector

Related topics