Issues when using tee with output-selector

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU): Jetson Orin NX 16GB
• DeepStream Version: 7.1
• JetPack Version (valid for Jetson only): 6.1
• TensorRT Version: 10.7
• Issue Type( questions, new requirements, bugs) Questions, bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing): deepstream_parallel_infer_app

I have tried to modify the architecture deepstream parallel app pipeline with some custom modification as per my requirement in both python and C++. The problem is python app freezes without any error, and c++ app crashes with error

deepstream-pipeline(+0x7a88)[0xaaaadacf7a88]
/lib/aarch64-linux-gnu/libgstreamer-1.0.so.0(+0x925a8)[0xffff8c8e25a8]
/lib/aarch64-linux-gnu/libglib-2.0.so.0(g_hook_list_marshal+0x58)[0xffff8c6b9608]
/lib/aarch64-linux-gnu/libgstreamer-1.0.so.0(+0x929f4)[0xffff8c8e29f4]
/lib/aarch64-linux-gnu/libgstreamer-1.0.so.0(+0x9486c)[0xffff8c8e486c]
/lib/aarch64-linux-gnu/libgstreamer-1.0.so.0(+0x97cb8)[0xffff8c8e7cb8]
/lib/aarch64-linux-gnu/libgstreamer-1.0.so.0(gst_pad_push+0x124)[0xffff8c8e80e8]
/usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstcoreelements.so(+0x436fc)[0xffff7c6036fc]
Unable to set device in gst_nvstreammux_src_collect_buffers

Pipeline graph:

The pipeline work fines when I remove the tee that is just after the streammux, but as expected the buffer for inactive pad of nvdsmetamux are being dropped.

I have referred the following post to build my pipeline.

let’s focus on the C version fisrt.

  1. did the app crash at the beginning or after a while? how did you use output-selector?
  2. from the error “Unable to set device”, there was an cuda error. could you share the complete log?
  3. could you simplfy the code to narrow down this issue? for exmaple, adding only fakesink after nvmetamux.
1 Like
  1. I have probe function on each nvinfer src pad, I can see atleast 2 buffers from each nvinfer were pushed, after that the pipeline freezes, also I am using nvurisrcbin with reconnection-interval 60, I can only see reconnection logs after the pipeline freezes and rest all logs (including GST_DEBUG) freezes
  2. I have tried setting GST_DEBUG=nvstreammux:9, but I don’t see any error. I will share the logs of GST_DEBUG logs, is there any other logs I should check?
  3. Sure, I will share the logs of simplified pipeline, allow me few hours.

Also, our main system will be using python. So is it possible to focus on debugging python issues?

The pipline you shared does not support parallel inference because only one nvinfer will do inference at the same time. please refer to the ready-made deepstream_parallel_inference_app sample deepstream_parallel_inference_app. you can dump the pipelne graph. then port the code to Python version.

I have referred the deepstream_parallel_inference_app, also I know that the pipeline I have built will not perform “parallel inference”, because I do not want parallel inference, I have a probe on sink pad of output-selector and I want to pass alternate buffers to each model. which I am able to do without Tee plugin (tee_streammux). The problem with that is the metamuxer has only one active pad (lets say sink_0), so the buffer from sink_1 and sink_2 are getting dropped because there is no buffer on sink_0 at that time (using output-selector), then i referred this post which suggested using “Tee” that will pass all the buffer to metamux to handle this buffer drops.

I have tried to simplify the pipeline, but still getting similar error

debug.txt (25.6 KB)
debug_python.txt (45.1 KB)

In python I can observe few buffer were passed to metamux but as soon as buffer is pushed from metamux the pipeline freezes, but in C++, pipeline crashed without any buffer getting processed by nvinfer (the probe function on src pad was not executed)

  1. If you don’t want parallel inference, nvmetamux is not needed because it is used to merge meta from the inference results of different models. from the pipeline, it seems the config is not set for nvdsmetamux.
  2. from the error “Unable to set device in gst_nvvideoconvert_transform”, did you set gpu-id to a value which is not zero? on Jetson, gpu-id needs to be zero.
  3. if you crash issue persists, you can continue to simplify the pipeline to check which element caused the crash. for exmaple, can “…->nvstreammux->nvinfer->fakesink” run well? can “…->nvstreammux->…output-selector->…->nvinfer->fakesink” run well?

Is there any alternative plugin that can be used in place of nvdsmetamux? my requirement is simple, to push the buffer from these 3 NvInfer into one single branch, (i.e. to a tracker) I tried using funnel but I cannot see any bbox when rendering using nvdsosd and nv3dsink.

  1. what are three models used to do respectively?
  2. If you don’t want parallel inference, you can use nvinfer in sequence, like “nvsreammux->nvnfer(1)->nvinfer(1)->nvifner(1)->…”, interval/secondary-reinfer-interval property of nvinfer can used to control the inference interval.
  1. All three model are object detection model, they all operate in primary mode.
  2. I tried the sequential processing, but it increases the latency. To handle the latency issue I implemented the parallel inference pipeline. but what we have observed is the “dequeOutputAndAttachMeta” has increased 3x from sequential inference pipeline, in sequential pipeline it used to take 80ms for each model (even this is too much) now in parallel inference pipeline its taking 180-200ms.
  1. please refer to this link for performance improvement.
  2. If you don’t want parallel inference, please use nvinfer in sequence. please refer to the sample deepstream-test2. If you want want parallel inference, please refer to the ready-made sample deepstream_parallel_inference_app, which does the real parallel inference. you can read the graph in the readme first.
  1. I have already referred that doc
  2. I have already implemented pipeline that does “real parallel inference”, attaching the graph of the pipeline for you ref

the “dequeuOutputandAttachMeta” of nvinfer in this pipeline takes 3x more time than sequential nvinfer

Thanks for the sharing! the graph is not clear. could you share a zip file? can this run well?
“dequeueOutputAndAttachMeta” includes inferecne, postprocessing and adding meta. did you add custom postprocessing? Since nvinfer plugin is opensource, you can add nsight measure code to check which part costs too much time.

parallel-inference.zip (535.4 KB)
The above pipeline works just fine, but the dequeOutputAndAttachMeta" is causing bottlenecks in the pipeline

we are using this repo, and it does has custom parse-bbox-func. But after digging through few blogs on internet, the postprocessing in nvinfer is sequential and not parallel for the frames in a batch? which makes sense because when the number of stream are increased, the time was this particular process increases (and it is not linear)

Thanks for the sharing! From the pipeline you shared, the latency is because the device will run three models at the time. The pipeline is a special case of deepstream_parallel_inference_app, which supports choosing sources for the specified model.

the latency is because the device will run three models at the time

i can understand that during inference, but when attaching metadata this should not affect the latency, my GPU/CPU are not even getting fully utilized.

The pipeline is a special case of deepstream_parallel_inference_app, which supports choosing sources for the specified model.

How? I am passing the same batch to all the nvinfer

I mean, deepstream_parallel_inference_app supports choosing source to do inference. Taking this cfg for example, there are 4 sources, only source 0,1,2 are chosen to do inferece by pige0. In your pipeline, all sources are chosen to do inferece by each nvinfer.

Anyway, any update with the original issue?

  1. from the pipeline, it seems the config is not set for nvdsmetamux.
  2. to narrow down this issue, can “output-selector->queue->nvinfer->fakesink” run well? can “output-selector->queue->nvinfer->nvmetamux->fakesink” run well?

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks