How to using my custom semantic segmentation model?

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) : RTX 3060
• DeepStream Version : 7.0
• TensorRT Version :
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs) : question
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

How can I use the custom semantic segmentation model in Deepstream 7.0?

  1. Semantic segmentation model input and output
  • 0 INPUT kFLOAT input.1 3x512x512
  • 1 OUTPUT kFLOAT 1145 1x512x512
  1. custom_config_infer.txt
[property]
gpu-id=0
gie-unique-id=1
interval=0

batch-size=3
net-scale-factor=0.003921569
model-color-format=0
infer-dims=3;512;512

onnx-file=../../../../tritonserver/models/customnet/1/segmentation-efficientnet-b3.onnx
model-engine-file=../../../../tritonserver/models/customnet/1/segmentation-efficientnet-b3.trt

process-mode=1
network-mode=2 # 0: FP32, 1: INT8, 2: FP16
network-type=2 # 0: Detector, 1: Classifier, 2: Segmentation, 3: Instance Segmentation

threshold=0.1
num-detected-classes=2
cluster-mode=4

parse-bbox-func-name=NvDsInferParseCustomDetection
parse-bbox-instance-mask-func-name=NvDsInferParseCustomSegmentation
custom-lib-path=../../gst-plugins/gst-nvinferserver/nvdsinfer_custom_impl_obstacle/obstacle_detection

scaling-filter=1
scaling-compute-hw=1
symmetric-padding=0
maintain-aspect-ratio=1
  1. I found custom parser forms in the official document.
  • 3.1. Custom bounding box parsing function
extern "C" bool NvDsInferParseCustomDetection(
	std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
	NvDsInferNetworkInfo const& networkInfo,
	NvDsInferParseDetectionParams const& detectionParams,
	std::vector<NvDsInferParseObjectInfo>& objectList);
  • 3.2. Custom bounding box and instance mask parsing function
bool NvDsInferParseCustomInstanceMask(
	std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
	NvDsInferNetworkInfo const& networkInfo,
	NvDsInferParseDetectionParams const& detectionParams,
	std::vector<NvDsInferParseObjectInfo>& objectList);
  • 3.3. Custom semantic segmentation output parsing function
extern "C"
bool NvDsInferParseCustomSegmentation(
    std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
    NvDsInferNetworkInfo const& networkInfo, float segmentationThreshold,
    unsigned int numClasses, int* classificationMap,
    float*& classProbabilityMap);
  1. How can I register NvDsInferParseCustomSegmentation in my config_infer.txt? The nvinfer official document does not have anything related to semantic segmentation.

please refer to this segmentation sample. Here is the configuration of nvinfer. please set custom-lib-path and parse-bbox-instance-mask-func-name for the custom postprocessing.

Thank you for reply!

I revised the config and custom parser by referring to your advice.

infer_config.txt

[property]
gpu-id=0
gie-unique-id=1
interval=0

batch-size=3
net-scale-factor=0.003921569
model-color-format=0
infer-dims=3;512;512

onnx-file=../../../../tritonserver/models/raildet/1/segmentation-efficientnet-b3.onnx
model-engine-file=../../../../tritonserver/models/raildet/1/segmentation-efficientnet-b3.trt

process-mode=1
network-mode=2 # 0: FP32, 1: INT8, 2: FP16
network-type=3 # 0: Detector, 1: Classifier, 2: Segmentation, 3: Instance Segmentation

num-detected-classes=1
cluster-mode=4

parse-bbox-instance-mask-func-name=CUSTOM_SEGMENTATION
custom-lib-path=../../gst-plugins/gst-nvinferserver/nvdsinfer_custom_impl_obstacle/obstacle_detection

segmentation-threshold=0.3

output-instance-mask=1
output-blob-names=1145
scaling-filter=1
scaling-compute-hw=1
symmetric-padding=0
maintain-aspect-ratio=1

custom_parser.cpp

void getMaskDimension(float* buf, int w, int h, int& left, int& top, int& width, int& height)
{
    int right = 0, bottom = 0;
    left = w, top = h;
    bool has_mask = false;

    for(int y = 0; y < h; y++) {
        for(int x = 0; x < w; x++) {
            float val = buf[y * w + x];
            if(val >= 0.0f) {
                has_mask = true;
                if (x < left) left = x;
                if (x > right) right = x;
                if (y < top) top = y;
                if (y > bottom) bottom = y;
            }
        }
    }

    if (!has_mask) {
        left = top = width = height = 0;
        return;
    }

    width = right - left + 1;
    height = bottom - top + 1;
}

void copy_mask(float* dst, float* src, int w, int h,
               int mask_left, int mask_top, int mask_width, int mask_height) {
    for (int y = 0; y < mask_height; y++) {
        for (int x = 0; x < mask_width; x++) {
            int src_x = mask_left + x;
            int src_y = mask_top + y;
            float val = src[src_y * w + src_x];

            dst[y * mask_width + x] = (val > 0.0f) ? 1.0f : 0.0f;
        }
    }
}


extern "C" bool CUSTOM_SEGMENTATION(
	std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
	NvDsInferNetworkInfo const& networkInfo,
	NvDsInferParseDetectionParams const& detectionParams,
	std::vector<NvDsInferInstanceMaskInfo>& objectList);

extern "C" bool CUSTOM_SEGMENTATION(
    const std::vector<NvDsInferLayerInfo> &outputLayersInfo,
    const NvDsInferNetworkInfo  &networkInfo,
    const NvDsInferParseDetectionParams &detectionParams,
    std::vector<NvDsInferInstanceMaskInfo> &objectList) {

    const NvDsInferLayerInfo* mask_layer = nullptr;

    for (const auto& layer : outputLayersInfo) {
        if (layer.layerName && std::string(layer.layerName) == "1145") {
            mask_layer = &layer;
            break;
        }
    }

    if (!mask_layer) {
        std::cerr << "ERROR: Output layer '1145' not found.\n";
        return false;
    }

    int channels = mask_layer->inferDims.d[0];
    int height = mask_layer->inferDims.d[1];
    int width = mask_layer->inferDims.d[2];

    if (channels != 1) {
        std::cerr << "ERROR: Expected output shape [1 x H x W], got [" 
                  << channels << " x " << height << " x " << width << "]\n";
        return false;
    }

    float* mask_data = static_cast<float*>(mask_layer->buffer);

    int left, top, mwidth, mheight;
    getMaskDimension(mask_data, width, height, left, top, mwidth, mheight);

    if (mwidth <= 0 || mheight <= 0) return true;

    float* new_mask = new float[mwidth * mheight];
    copy_mask(new_mask, mask_data, width, height, left, top, mwidth, mheight);

    NvDsInferInstanceMaskInfo obj;
    obj.left = left;
    obj.top = top;
    obj.width = mwidth;
    obj.height = mheight;
    obj.classId = 0;
    obj.detectionConfidence = 1.0f;
    obj.mask = new_mask;
    obj.mask_size = sizeof(float) * mwidth * mheight;
    obj.mask_width = mwidth;
    obj.mask_height = mheight;

    objectList.push_back(obj);

    return true;
}

nvosd config.yml

osd:
  enable: 1
  gpu-id: 0
  border-width: 1
  display-text: 0
  text-size: 15
  text-color: 1;1;1;1
  text-bg-color: 0.3;0.3;0.3;1
  font: Serif
  show-clock: 0
  clock-x-offset: 800
  clock-y-offset: 820
  clock-text-size: 12
  clock-color: 1;0;0;0
  display-mask: 1
  nvbuf-memory-type: 0

After the above modification, the custom parser function runs well and the mask value is output. However, the mask is not drawn in the nvosd result. What is the problem?

  1. which sample are you testing or referring to? what is the complete media pipeline?
    2.nvosd is responsible for drawing instance segmentation mask. please refer to the sample in my last comment.
  1. I’m using deepstream-parallel-inference-app
  2. How can i use segvisual in my case?

deepstream-parallel-inference-app supports doing inference in parallel and merging metadata. if you are testing one instance segmentation mdoel, please use tao_segmentation above instead. if not, what are the models used to do respectfully? what is the media pipeline? do you need to merge the metadata from different models? Thanks!

I use two types of primary detectors

  1. Object Detector (YOLOv7)
  2. Railway Detector (TepNet)

This is my media pipeline

Thank you for reply!

Output of the tepnet is binary mask which fill with probability. For pixels that exceed the threshold, I want to draw a railway on osd.

source3_config.yml

osd:
  enable: 1
  gpu-id: 0
  border-width: 1
  display-text: 1
  text-size: 15
  text-color: 1;1;1;1
  text-bg-color: 0.3;0.3;0.3;1
  font: Serif
  show-clock: 0
  clock-x-offset: 800
  clock-y-offset: 820
  clock-text-size: 12
  clock-color: 1;0;0;0
  display-mask: 1
  nvbuf-memory-type: 0

custom_parser.cpp

extern "C" bool TEPNET_SEGMENTATION (
    const std::vector<NvDsInferLayerInfo> &outputLayersInfo,
    const NvDsInferNetworkInfo  &networkInfo,
    const NvDsInferParseDetectionParams &detectionParams,
    std::vector<NvDsInferInstanceMaskInfo> &objectList) {

    const NvDsInferLayerInfo* mask_layer = nullptr;

    for (const auto& layer : outputLayersInfo) {
        if (layer.layerName && std::string(layer.layerName) == "1145") {
            mask_layer = &layer;
            break;
        }
    }

    if (!mask_layer) {
        std::cerr << "ERROR: Output layer '1145' not found.\n";
        return false;
    }

    int channels = mask_layer->inferDims.d[0];
    int height = mask_layer->inferDims.d[1];
    int width = mask_layer->inferDims.d[2];

    if (channels != 1) {
        std::cerr << "ERROR: Expected output shape [1 x H x W], got [" 
                  << channels << " x " << height << " x " << width << "]\n";
        return false;
    }

    float* mask_data = static_cast<float*>(mask_layer->buffer);

    int left, top, mwidth, mheight;
    getMaskDimension(mask_data, width, height, left, top, mwidth, mheight);

    if (mwidth <= 0 || mheight <= 0) return true;

    float* new_mask = new float[mwidth * mheight];
    copy_mask(new_mask, mask_data, width, height, left, top, mwidth, mheight);

    NvDsInferInstanceMaskInfo obj;
    obj.left = left;
    obj.top = top;
    obj.width = mwidth;
    obj.height = mheight;
    obj.classId = 1;
    obj.detectionConfidence = 1.0f;
    obj.mask = new_mask;
    obj.mask_size = sizeof(float) * mwidth * mheight;
    obj.mask_width = mwidth;
    obj.mask_height = mheight;

    objectList.push_back(obj);

    return true;
}

config_infer.txt

[property]
gpu-id=0
gie-unique-id=1
interval=0

batch-size=3
net-scale-factor=0.003921569
model-color-format=0
infer-dims=3;512;512

onnx-file=../../../../tritonserver/models/tepnet/1/segmentation-efficientnet-b3.onnx
model-engine-file=../../../../tritonserver/models/tepnet/1/segmentation-efficientnet-b3.trt

process-mode=1 # 1: Primary, 2: Secondary
network-mode=2 # 0: FP32, 1: INT8, 2: FP16
network-type=3 # 0: Detector, 1: Classifier, 2: Segmentation, 3: Instance Segmentation

num-detected-classes=2
cluster-mode=4

parse-bbox-instance-mask-func-name=TEPNET_SEGMENTATION
custom-lib-path=../../gst-plugins/gst-nvinferserver/nvdsinfer_custom_impl_obstacle/obstacle_detection

segmentation-threshold=0.3

output-instance-mask=1
symmetric-padding=0
maintain-aspect-ratio=1

could you share configuration of deepstream_parallel_inference_app? like source4_1080p_dec_parallel_infer.yml.

I fixed the problem, the problem was that the input image was adjusted to 512x288 size due to the main-aspect-ratio parameter set in infer_config.

When the output of the mask is mapped within the range of 288 by adjusting the parameters, it is visualized succesfully.

Thank you for your help :)

Glad to know you fixed it, Is this still an DeepStream issue to support? Thanks!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.