Custom Autoencoder Model

Hello,

I’ve got a model trained in TF, converted to ONNX, which is an autoencoder used for anomaly detection using reconstruction error.

I’m not sure how to proceed to integrate this custom model to DeepStream on Jetson Xavier, would appreciate any help!

I looked through Using a Custom Model with DeepStream — DeepStream documentation 6.4 documentation (nvidia.com), but I’m still a bit lost.

As far as I understood, the model will be ran using nvinfer, how can I check/debug the output of each DS plugin (on Python) ?

Thanks : )

ETA:

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
Jetson Xavier AGX
• DeepStream Version
6.3
• JetPack Version (valid for Jetson only)
5.1.2
• Issue Type( questions, new requirements, bugs)
Questions

Can you describe what is the input layers and output layers of the model?

Do you know how the model output will be used?

The internal network structure is not important for DeepStream. Please make sure you know the input and output of the model.

Hi Fiona,

Thanks for helping!

Yes of course; the input of the model will be either RGB images, or grayscale images (we have various models planned for testing).
The initial output of the model will be a reconstruction of the input, i.e. another image.

We will compute the reconstruction error of the image as to detect anomalous ones.
Would this be doable out of the box with DeepStream?

We would also like to output from DeepStream some metadata (i.e. timings, detections) as well as input (from the detection timings) as well as model-output (from the detection timings again).
Is this feasible?

Thanks!

What is the format and resolution of the image? Is it the same as the iput RGB/grayscale images from input?

What do you mean by “timings”, “detections”? What do you mean by “from the detection timings” and “from the detection timings”?

Yes, resolution is at ~2000x1400 at the moment, but this is less important right now.
Although, we’re still actively figuring out how to crop/resize in preprocessing in DeepStream, or if the ROI option in preprocessing is enough.

The idea is to have RTSP stream(s), but we’re only interested (and the model is trained) on a cropped part of the stream (green).
The model receives the cropped image, and outputs a frame of the same format/resolution as the cropped image.
Input and output of the model are compared for reconstruction error.
If this error is greater than a defined metric, we consider the frame at time T to be detected.

Optimally, the Pipeline will be fed various RTSP streams, with each their own cropping/resizing.

We want DeepStream to output:

  • A video sequence of the original Input (or cropped input) +/-X frames after/before a detection occurs
    • We need to buffer X frames of the stream
  • MetaData: In this case for us, timestamp at which detection happens
    • And maybe some other info we compute

Questions:

  • Can we deploy such a model on DeepStream? How?
  • Processing:
    • Can we crop the input as desired, or would the ROI config of nvdspreprocessing be sufficient?
    • Can we do the postprocessing, of comparing model-input to model-output as to compute reconstruction error?
  • Outputs:
    • How to output the desired data? (Video +/- X Frames after/before detection)
    • MetaData (result of error reconstruction, timestamp,…)

Thank you! :)

I’m asking whether the model input image and model output image are in the same format and resolution. You only need to consider the model input and output to figure out how to use gst-nvinfer

From your graph, seems you want to get the reconstruction error of specific ROI in the video frames. Will the ROI change from frame to frame? Does your model support batched input? If there are batched images input, are there batched images output from the model? Do the batched frames have the same ROI or different ROIs?

Yes I’m aware, I was replying that yes, they’re same resolution.
With the caveat, that we’re still looking into how to crop from RTSP stream resolution beforehand.

  • No, ROI is fixed per camera (constant ROI for all frames of given camera)
  • Yes, but not planed to use batched input as of yet (should we?)

So you can deploy your model with gst-nvinfer and parse the output image into the customized frame user meta.

Since your model only needs RGB/grayscale images for the ROI, the nvvideoconvert plugin can do the format conversion to RGB/grascale and ROI crop jobs before gst-nvinfer. And then you can get the ROI image directly from gst-nvinfer output GstBuffer.
There is sample code of how to get raw data from the NvBufSurface: DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

To get the model output image, please set “output-tensor-meta=1” and “network-type=100” in the nvinfer configuration file to enable customized output tensor parsing.Gst-nvinfer — DeepStream documentation 6.4 documentation

And please parse the output tensor in your app:
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvinfer.html#tensor-metadata

There is the customized frame user meta sample /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-user-metadata-test

Hi Fiona,

Thank you, I will check those out!
Will come back with questions if any arise.

Thanks a lot !

I’m closing this topic due to there is no update from you for a period, assuming this issue was resolved.
If still need the support, please open a new topic. Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.