Yes of course; the input of the model will be either RGB images, or grayscale images (we have various models planned for testing).
The initial output of the model will be a reconstruction of the input, i.e. another image.
We will compute the reconstruction error of the image as to detect anomalous ones.
Would this be doable out of the box with DeepStream?
We would also like to output from DeepStream some metadata (i.e. timings, detections) as well as input (from the detection timings) as well as model-output (from the detection timings again).
Is this feasible?
Yes, resolution is at ~2000x1400 at the moment, but this is less important right now.
Although, we’re still actively figuring out how to crop/resize in preprocessing in DeepStream, or if the ROI option in preprocessing is enough.
The idea is to have RTSP stream(s), but we’re only interested (and the model is trained) on a cropped part of the stream (green).
The model receives the cropped image, and outputs a frame of the same format/resolution as the cropped image.
Input and output of the model are compared for reconstruction error.
If this error is greater than a defined metric, we consider the frame at time T to be detected.
Optimally, the Pipeline will be fed various RTSP streams, with each their own cropping/resizing.
We want DeepStream to output:
A video sequence of the original Input (or cropped input) +/-X frames after/before a detection occurs
We need to buffer X frames of the stream
MetaData: In this case for us, timestamp at which detection happens
And maybe some other info we compute
Questions:
Can we deploy such a model on DeepStream? How?
Processing:
Can we crop the input as desired, or would the ROI config of nvdspreprocessing be sufficient?
Can we do the postprocessing, of comparing model-input to model-output as to compute reconstruction error?
Outputs:
How to output the desired data? (Video +/- X Frames after/before detection)
MetaData (result of error reconstruction, timestamp,…)
I’m asking whether the model input image and model output image are in the same format and resolution. You only need to consider the model input and output to figure out how to use gst-nvinfer
From your graph, seems you want to get the reconstruction error of specific ROI in the video frames. Will the ROI change from frame to frame? Does your model support batched input? If there are batched images input, are there batched images output from the model? Do the batched frames have the same ROI or different ROIs?
Yes I’m aware, I was replying that yes, they’re same resolution.
With the caveat, that we’re still looking into how to crop from RTSP stream resolution beforehand.
No, ROI is fixed per camera (constant ROI for all frames of given camera)
Yes, but not planed to use batched input as of yet (should we?)
So you can deploy your model with gst-nvinfer and parse the output image into the customized frame user meta.
Since your model only needs RGB/grayscale images for the ROI, the nvvideoconvert plugin can do the format conversion to RGB/grascale and ROI crop jobs before gst-nvinfer. And then you can get the ROI image directly from gst-nvinfer output GstBuffer.
There is sample code of how to get raw data from the NvBufSurface: DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums
I’m closing this topic due to there is no update from you for a period, assuming this issue was resolved.
If still need the support, please open a new topic. Thanks