SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1215
Semantic Segmentation Using Deep Learning
Shubham Singh1
, Sajal Kaushik2
, Rahul Vats3
, Arihant Jain4
, and Narina Thakur5
1
Bharati Vidyapeeth’s College of Engineering, A-4, Paschim Vihar, New Delhi 110063
----------------------------------------------------------------------***-------------------------------------------------------------------------
Abstract— Semantic image segmentation is an essential com- ponent of modern autonomous driving systems, as an accurate understanding
of the surrounding scene is crucial to navigation and action planning. Current state-of-the-art approaches in semantic image segmentation
rely on pre-trained networks that were initially developed for classifying images as a whole. While these networks exhibit outstanding
recognition performance, they lack localization accuracy. Therefore, additional memory intensive units have to be included in order to
obtain pixel- accurate segmentation masks at the full image resolution. To alleviate this problem we Implemented various Standard
Models such as GCN, DeepLabV3, PSPNet and FC-Densenet on CamVid image frames dataset, try to optimize them and then we proposed a
novel FRRN based architecture that exhibits strong localization and recognition performance. We combine multi- scale context with pixel-
level accuracy by using four (two as of in FRRN) processing streams within our network: One stream carries information at the full image
resolution, enabling precise adherence to segment boundaries. Other streams undergoes asequence of pooling operations to obtain robust
features for recognition. The two streams are coupled at the full and half image resolution using residuals. Our approach achieves an
intersection-over-union score of 0.87 on the CamVid dataset.
I. INTRODUCTION
Semantic segmentation is an important aspect of image analysis task and a key problem in Computer vision. It describes the
process of associating each pixel of an image with a class label like car, bus, road, pole, etc. Semantic segmentation is widely
used in autonomous driving, medi- cal image segmentation, Geo-Sensing, Facial segmentation, Precision Agriculture, Human-
Machine interaction, Image search engines and many more. These problems have been solved using traditional Machine
Learning and Computer Vision techniques but advancements in Deep learning tech- nology have created lot of space to
improve them in terms of accuracy and efficiency.
Semantic segmentation is more informative than image classification and object localization. While image classifica- tion tells
about the presence of an object in image and object localization locates the objects by making bounding boxes around them
before classification, semantic segmentation classify each and every pixel of objects in image. Also, Instance segmentation is
similar to semantic segmentation but it also classify different instances of a class within aimage, like two cars in a image.
Semantic segmentation not only predicts classes of objects but also tells about spatial location of those classes in image.
Further, different instances of same class can also be classified and also components of already segmented classes can also be
classified. But this paper focus only on general
Fig. 1.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1216
per-pixel classification, i.e., same labels are given to different instances of same class and if they are overlapped their
boundary is not distinguished as shown in Fig. 1 (c).
While image segmentation groups similar pixels of class together, in Video segmentation disjoint sets of consecutive and
homogeneous frames are segmented that exhibit coher- ence in both motion and appearance. Tosegment dynamic scenes of
a video in high quality, deep learning models paved way to achieve better performance than the traditional algo- rithms.
Video segmentation is useful in activity recognition and other visual enhancements.
II. RELATED WORK
A. BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation
BiseNet performs real time semantic segmentation by taking into account the contextual and spatial features. The spatial path
with a small stride preserves the spatial informa- tion and generate high-resolution features. And the Context Path with a fast
downsampling strategy is used to get enough receptive field which runs parallelly to the spatialpath. These fusion of two paths
results in better accuracy without loss of speed termed Feature Fusion Model (FFM). Also a Attention Refinement Model
refines the features of each stage by using global average pooling.[1]
B. SegNet: A Deep Convolutional Encoder-Decoder Archi- tecture
In this semantic pixel wise segmentation is done termed as SegNet. The architecture consist of a Encoder similar to the
convolutional layers in the VGG16 network and a Decoder followed by pixel-wise classification layer. Here Encoder performs
convolutions of the given input to get set of features which are normalized in batches. Further ReLu is applied in input data
element wise which is pooled by max pooling followed by sub sampling of the result.[2]
C. MobileNets for Semantic Segmentation
This model is based on depth-wise separable convolutions. It is a type of factorized convolution which factorize a standard
convolution into a depth-wise convolution and a 11 convolution called a point-wise convolution. A standard convolution
simultaneously filters and combines input into a new set of outputs in a single step, whereas the depth-wise separable
convolution does this in two layers, a separate layer for filtering and a separate layer for combining.[3]
D. RefineNet: Multi-Path Refinement Networks for High- Resolution Semantic Segmentation
It is a generic multi-path refinement network which uses long-range residual connections for high-resolution predic- tions.
Here, multiple paths over which information from different resolutions and via potentially long-range connec- tions, is
assimilated using a generic building block, termed RefineNet. The deep layers captures high level semantic features thus are
refined using fine-grained features resulted from earlier convolutions.[4]
E. ICNet for Real-Time Semantic Segmentation
Image Cascade Network (IcNet) incorporates multi- resolution branches under proper label guidance. Here, cas- cade image
inputs i.e., images with varying resolution are used and cascade feature fusion (CFF) unit is employed. Input image with full
resolution is downsampled by factors of four and two, which acts as a cascade input to branches of high and medium
resolution. CFF is used for combining cascade features from inputs of various resolution.[5]
III. DATASET AND PREPROCESSING
The dataset is taken from the Cambridge-driving Labeled Video(CamVid) Database. It is collection of videos with object class
semantic labels. The CamVid dataset consists of: the original video sequences, the list of class labels and the hand labeled
frames.
It provides ten minutes of 30Hz footage with correspond- ing semantically labeled images at 1Hz.
Dataset consist of 6 directories: train, train labels, test, test labels, val and val labels. Labelled directories consist of labelled
data and other directories consist of actual images without labels.[6]
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1217
A. Scaling
This step is done to make sure that images have same size and aspect ratio. After this, we scale the image as per the model.
B. Normalization
It is used to makes the convergence faster during the time of training the model. It is done by subtracting the mean from
each pixel and then dividing the result by the standard deviation as a result distribution will resemble Gaussian curve with
mean at zero. The pixel intensity will now lie in range [0,1].
IV. APPROACHES
In this paper, we implement U-net, Dilated U-net and PSPnet.
A. U-net
U-net architecture consist of 2 paths which are basically known to be as encoder and decoder. Encoder is also known as
contraction path and decoder is also known as symmetric expanding path.
Encoder captures the context in the image. Here, convolu- tion blocks followed by a maxpool downsampling to encode the
input image into feature representations at multiple different levels is applied. Hence, it is stack of convolutional and max
pooling layers. It is also called downsampling.
Decoder consists of upsample and concatenation followed by regular convolution operations. It enable precise localiza-
tion using transposed convolutions.
Hence it is an end-to-end fully convolutional network. It contains Convolutional layers only and does not contain Dense
layer because of which it can accept image of any size.
In U-net, pooling layers increase the field of view and are able to aggregate the context while discarding the where
information and semantic segmentation requires the exact alignment of class maps and thus, needs the where informa- tion
to be preserved. Hence, U-net is preferred here.[7]
We have hyper-tweaked the U-net model to improve the IoU metric.
B. Dilated U-net
In this model, the simple convolutions are replaced by dilated convolutions. Dilated convolutions are also called atrous
convolutions.
For 1D signal x[i], the y[i] output of a dilated convolution with the dilation rate r and a filter w[s] with size S is formulated
as:
Let x[i] be a 1D signal, r be the dilation rate and filter w[s] with size S. The output y[i] of the dilated convolutions is:
S
y[i]= x[i +r.s]w[s]
s=1
Dilated convolutions bring translated variant in input as:
f (g(x)) = g(f (x))
where g(.) is convolution operation and f(.) is translation operation.
This will help in reducing the parameters massively be- cause receptive fields grow aggressively. Along with this, pooling
helps in pixel wise classification. But pooling layer decreases the resolution of the input images as a result dilated U-net
model does not works well.[8]
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1218
C. PSPnet
PSPnet is abbreviation for Pyramid Scene Parsing Net- work. This model propose pyramid pooling module to join the
context of the image hence, called PSPnet. Dilated convolutions are used to modify Resnet and a pyramid pooling module is
added to it. Pyramid pooling module captures information by applying large kernel pooling layers. This module
concatenates the feature maps from ResNet with upsampled output of parallel pooling layers with kernels covering whole,
half of and small portions of image.
Also an auxiliary loss is applied after the fourth stage of ResNet (i.e input to pyramid pooling module), called as
intermediate supervision.
The resolution of image is also preserved in PSPnet because it uses large pooling layer.[9]
Fig. 2. Input Image - Ground Truth - Predicted Image (PSPNet)
Fig. 3. IoU vs Epoch (PSPNet)
D. Fully Convolutional DenseNets
It is a a CNN with Densely Connected Convolutional Networks, called as DenseNets. It is based upon the fact that if each
layer is directly connected to every other layer in a feed-forward fashion then the model will become more
Fig. 4. Loss vs Epoch (PSPNet)
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1219
accurate and will be easier and efficient to train. They have various advantages such as they reduce the vanishing-
gradient problem, strengthen feature propagation, encourage feature reuse, and reduce the number of parameters.
FCNs are built from a downsampling path, an upsam- pling path and skip associations. Skip associations help the
upsampling path recover spatially detailed data from the downsampling path, by reusing features maps.
The objective of the model is to further exploit the feature reuse and avoiding the feature explosion at the upsampling path
of the network.[10]
Fig. 5. Input Image - Ground Truth - Predicted Image (FC DenseNet)
Fig. 6. IoU vs Epoch (FC DenseNet)
Fig. 7. Loss vs Epoch (FC DenseNet)
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1220
E. Global Convolutional Network(GCN)
In this, an encoder - decoder architecture with very large kernels convolutions is proposed. Kernel size is increased to
spatial size of feature map. Such convolutions are used because fully connected layers are not able to perform semantic
segmentation well. Also, large kernels have very large receptive field and model gather information from much smaller area.
Large kernels have a lot of parameters and are computationally expensive. So, in order to avoid that convo- lutions are
approximated. This approximated convolution is called global convolution. Encoder used is ResNet(without any dilated
convolutions). Decoder consist of GCNs and deconvolutions.[11]
Fig. 8. Input Image - Ground Truth - Predicted Image (GCN)
Fig. 9. IoU vs Epoch (GCN)
F. DeeplabV3
DeepLab is Google,s open sourced model of semantic seg- mentation where concept of atrous convolution is introduced
which is a generalized form of the convolution operation. It uses atrous convolution with rates 6, 12 and 18. Here, rate is
a parameter that controls the effective field of view of the convolution. With inspiration from success of Spatial pyramid
pooling, Atrous spatial pyramid pooling was made where four parallel atrous convolutions with different atrous rates, i.e. 1
x 1 convolution and 3 x 3 atrous convolution with rates [6, 12, 18], are applied on top of the feature map, as it is effective to
resample features at different scales for accurately and efficiently classifying regions of an arbitrary scale. Bilinear
upsampling is used to scale the features to the correct dimensions.
In the later version i.e, DeeplabV3+ a Decoder module on top of the regular DeepLabV3 model is added. Here, instead of
using bilinear upsampling,encoded features are upsampled and then concatenated with corresponding low level features
from the encoder module having same spatial dimension. [12]
Fig. 11. Input Image - Ground Truth - Predicted Image (DeepLabV3)
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1221
G. Optimized FRRN
In this FRRN based Model, we have added more FRRU units which try to capture more of local features at pixel level for
better classification of accuracies. MultiScale extracted features are passed in to each of the units that extract the features
then they are passed at each of the streams.
Our design is motivated by the need to have networks that can jointly compute good high-level features for recognition
Fig. 12. IoU vs Epoch (DeepLab v3)
Fig. 13. Loss vs Epoch (DeepLab v3)
and good low-level features for localization. Regardless of the specific network design, obtaining good high level features
requires a sequence of pooling operations.
The pooling operations reduce the size of the feature maps and increase the networks receptive field, as well as its
robustness against small translations in the image.
Fig. 14. Input Image - Ground Truth - Predicted Image (Optimized FRRN)
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1222
V. EXPERIMENTAL SETUP AND RESULTS
A. Evaluation Criteria And Procedure
Intersection over Union(IoU) is used as a evaluation cri- teria. It is similar to Jaccard index. It is used to compare the
diversity and similarity of two images or sets. It is defined
Fig. 16. Loss vs Epoch (Optimized FRRN)
as the size of intersection of two images divided by the size of union of two images.
IoU(X,Y) =
X ∩ Y
X ∪Y
where X and Y are two sets or images.
B. Experimental Results
Approach Evaluation Criteria
Precisio
n
Recall F1
Score
IoU
PSPNet 0.74 0.74 0.74 0.81
FC-DenseNet 0.74 0.77 0.79 0.79
GCN 0.80 0.84 0.86 0.57
DeepLabV3 0.72 0.63 0.64 0.81
Our approach 0.84 0.82 0.82 0.87
VI. CONCLUSIONS
This Paper try to show the results of Various Deep learning Models on the Camvid analyzing various parameters with
respect to the increasing the epochs including intersection over union score, validation Score and much more.
REFERENCES
[1] Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N. (2018). Bisenet: Bilateral segmentation network for real-time semantic
segmentation. In Proceedings of the European Conference on Computer Vision(ECCV) (pp. 325-341).
[2] Badrinarayanan, V., Kendall, A., Cipolla, R. (2017). Segnet: A deep convolutional encoder-decoder architecture for
image segmenta- tion. IEEE transactions on pattern analysis and machine intelligence, 39(12), 2481-2495.
[3] Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., ... Adam, H. (2017). Mobilenets: Efficient
convolu- tional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
[4] Lin, G., Milan, A., Shen, C., Reid, I. (2017). Refinenet: Multi-path refinement networks for high-resolution semantic
segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1925-1934).
[5] Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J. (2018). Icnet for real-time semantic segmentation on high-resolution images. In
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1223
Proceedings of the European Conference on Computer Vision (ECCV) (pp. 405-420).
[6] Brostow, G. J., Fauqueur, J., Cipolla, R. (2009). Semantic object classes in video: A high-definition ground truth database.
Pattern Recognition Letters, 30(2), 88-97.
[7] Ronneberger, O., Fischer, P., Brox, T. (2015, October). U-net: Convo- lutional networks for biomedical image
segmentation. In International Conference on Medical image computing and computer-assisted inter- vention (pp. 234-
241). Springer, Cham.
[8] Yu, F., Koltun, V.(2015). Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122.
[9] Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J. (2017). Pyramid scene parsing network. In Proceedings of the IEEE conference on
computer vision and pattern recognition (pp. 2881-2890).
[10] Jgou, S., Drozdzal, M., Vazquez, D., Romero, A., —& Bengio, Y.(2017). The one hundred layers tiramisu: Fully convolutional
densenets for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition Workshops (pp. 11-19).
[11] Peng, C., Zhang, X., Yu, G., Luo, G., —& Sun, J. (2017). Large Kernel Matters–Improve Semantic Segmentation by Global
Convolutional Network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4353-
4361).
[12] Chen, L. C., Papandreou, G., Schroff, F., —& Adam, H. (2017). Rethinking atrous convolution for semantic image
segmentation.arXiv preprint arXiv:1706.05587.
[13] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The cityscapes
dataset for semantic urban scene understanding. In CVPR, 2016.
[14] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence
Zitnick. 2014. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision.
[15] N. Silberman, D. Hoiem, P. Kohli, R. Fergus, ”Indoor segmentation and support inference from RGBD images”, Proc. 12th
Eur. Conf. Comput. Vis. , pp. 746-760, 2012.
[16] G. Neuhold, T. Ollmann, S. Rota Bul, and P. Kontschieder. The Mapillary Vistas Dataset for Semantic Understanding of
Street Scenes. In International Conference on Computer Vision (ICCV), 2017.
[17] Brostow GJ, Fauqueur J, Cipolla R, (2009) Semantic object classes in video: A high-definition ground truth database.
Pattern Recognit Lett 30:8897.
[18] J. Fu, J. Liu, Y. Wang, and H. Lu. Stacked deconvolutional network for semantic segmentation. arXiv preprint
arXiv:1708.04943, 2017
[19] G. Lin, A. Milan, C. Shen, and I. Reid, Refinenet: Multi-path refine- ment networks for high-resolution semantic
segmentation, in CVPR, 2017.
[20] C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang. Bisenet: Bilateral segmentation network for real-time semantic
segmentation. 2018.
[21] V. Badrinarayanan, A. Kendall, and R. Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image
segmentation. TPAMI, 2017.
[22] Romera, E., Alvarez, J. M., Bergasa, L. M., Arroyo, R. (2018). Erfnet: Efficient residual factorized convnet for real-time
semantic segmenta- tion. IEEE Transactions on Intelligent Transportation Systems, 19(1), 263-272.
[23] Lin, G., Shen, C., Van Den Hengel, A., —& Reid, I. (2018). Exploring context with deep structured models for semantic
segmentation. IEEE transactions on pattern analysis and machine intelligence, 40(6), 1352- 1366.
[24] Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., —& Agrawal, A. (2018). Context encoding for semantic
segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7151-7160).
[25] R. P. K. Poudel, U. Bonde, S. Liwicki, and C. Zach. ContextNet: Exploring context and detail for semantic segmentation
in real-time.
In BMVC, 2018
[26] K. Rakelly, E. Shelhamer, T. Darrell, A. A. Efros, and S. Levine, Few- shot segmentation propagation with guided
networks, arXiv preprint arXiv:1806.07373, 2018.
[27] S. Jain and J. Gonzalez. Fast semantic segmentation on video us- ing block motion-based feature interpolation. In
ECCV International Workshop on Video Segmentation, 2018
[28] Tao Yang, Yan Wu, Junqiao Zhao, and Linting Guan. Semantic segmentation via highly fused convolutional network
with multiple soft cost functions. arXiv preprint arXiv:1801.01317, 2018.
[29] Gharghabi, S., Yeh, C. C. M., Ding, Y., Ding, W., Hibbing, P., LaMunion, S., ... —& Keogh, E. (2019). Domain agnostic online
semantic segmentation for multi-dimensional time series. Data mining and knowledge discovery, 33(1), 96-130.
[30] Chiu, H. P., Samarasekera, S., Kumar, R., Villamil, R., Murali, V.,
—& Kessler, G. D. (2019). U.S. Patent Application No. 16/101,201.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1224
[31] Gharghabi, S., Yeh, C. C. M., Ding, Y., Ding, W., Hibbing, P., LaMunion, S., ... —& Keogh, E. (2019). Correction to: Domain
agnostic online semantic segmentation for multi-dimensional time series. Data Mining and Knowledge Discovery, 1-
2.
[32] Desai, A. D., Gold, G. E., Hargreaves, B. A., —& Chaudhari,
A. S. (2019). Technical Considerations for Semantic Segmenta- tion in MRI using Convolutional Neural Networks.
arXiv preprint arXiv:1902.01977.
[33] Rakelly, K., Shelhamer, E., Darrell, T., Efros, A., —& Levine, S. (2018). Conditional networks for few-shot semantic
segmentation.
[34] Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., —& Cottrell, G. (2018, March). Understanding convolution for
semantic segmentation. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 1451-
1460). IEEE.
[35] Shimoda, M., Sada, Y., —& Nakahara, H. (2019, April). Filter-Wise Pruning Approach to FPGA Implementation of Fully
Convolutional Network for Semantic Segmentation. In International Symposium on Applied Reconfigurable
Computing (pp. 371-386). Springer, Cham.
[36] Li, H., Xiong, P., Fan, H., —& Sun, J. (2019). DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation.
arXiv preprint arXiv:1904.02216.

More Related Content

What's hot (18)

PDF
45 135-1-pb
Mahendra Sisodia
 
PDF
Application of diversity techniques for multi user idma communication system
Alexander Decker
 
PDF
IRJET- An Efficient VLSI Architecture for 3D-DWT using Lifting Scheme
IRJET Journal
 
PDF
T4408103107
IJERA Editor
 
PDF
Non standard size image compression with reversible embedded wavelets
eSAT Publishing House
 
PDF
Digital Image Watermarking Basics
IOSR Journals
 
PDF
Defended Data Embedding For Chiseler Avoidance in Visible Cryptography by Usi...
IOSR Journals
 
PDF
Energy and latency aware application
csandit
 
PDF
SVD Based Robust Digital Watermarking For Still Images Using Wavelet Transform
cscpconf
 
PDF
Gd3111841188
IJERA Editor
 
PDF
Simple and Fast Implementation of Segmented Matrix Algorithm for Haar DWT on ...
IDES Editor
 
PDF
Dynamic Texture Coding using Modified Haar Wavelet with CUDA
IJERA Editor
 
PDF
Ec36783787
IJERA Editor
 
PDF
A High Performance Modified SPIHT for Scalable Image Compression
CSCJournals
 
PDF
Information search using text and image query
eSAT Journals
 
PDF
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...
IJERA Editor
 
45 135-1-pb
Mahendra Sisodia
 
Application of diversity techniques for multi user idma communication system
Alexander Decker
 
IRJET- An Efficient VLSI Architecture for 3D-DWT using Lifting Scheme
IRJET Journal
 
T4408103107
IJERA Editor
 
Non standard size image compression with reversible embedded wavelets
eSAT Publishing House
 
Digital Image Watermarking Basics
IOSR Journals
 
Defended Data Embedding For Chiseler Avoidance in Visible Cryptography by Usi...
IOSR Journals
 
Energy and latency aware application
csandit
 
SVD Based Robust Digital Watermarking For Still Images Using Wavelet Transform
cscpconf
 
Gd3111841188
IJERA Editor
 
Simple and Fast Implementation of Segmented Matrix Algorithm for Haar DWT on ...
IDES Editor
 
Dynamic Texture Coding using Modified Haar Wavelet with CUDA
IJERA Editor
 
Ec36783787
IJERA Editor
 
A High Performance Modified SPIHT for Scalable Image Compression
CSCJournals
 
Information search using text and image query
eSAT Journals
 
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...
IJERA Editor
 

Similar to IRJET- Semantic Segmentation using Deep Learning (20)

PPTX
Semantic segmentation with Convolutional Neural Network Approaches
UMBC
 
PDF
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
PDF
The Future of Health Monitoring: Advances in Wearable Sensor Data Processing
IgMin Publications Inc.
 
PDF
Semantic Video Segmentation with Using Ensemble of Particular Classifiers and...
ITIIIndustries
 
PPTX
AaSeminar_Template.pptx
ManojGowdaKb
 
PPTX
Review-image-segmentation-by-deep-learning
Trong-An Bui
 
PDF
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
ynxm25hpxp
 
PPTX
Image Segmentation: Approaches and Challenges
Apache MXNet
 
PPTX
Image Segmentation Using Deep Learning : A survey
NUPUR YADAV
 
PPTX
DefenseTalk_Trimmed
Abhishek Sharma
 
PPTX
fully convolutional networks for semantic segmentation
XinyangLi16
 
PDF
#6 PyData Warsaw: Deep learning for image segmentation
Matthew Opala
 
PDF
Online video object segmentation via convolutional trident network
NAVER Engineering
 
PDF
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
PDF
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
PPTX
image_segmentation_ppt.pptx
fgdg12
 
PDF
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
Edge AI and Vision Alliance
 
PPTX
Image segmentation hj_cho
Hyungjoo Cho
 
PDF
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Universitat Politècnica de Catalunya
 
PDF
Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...
Mara Graziani
 
Semantic segmentation with Convolutional Neural Network Approaches
UMBC
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
The Future of Health Monitoring: Advances in Wearable Sensor Data Processing
IgMin Publications Inc.
 
Semantic Video Segmentation with Using Ensemble of Particular Classifiers and...
ITIIIndustries
 
AaSeminar_Template.pptx
ManojGowdaKb
 
Review-image-segmentation-by-deep-learning
Trong-An Bui
 
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
ynxm25hpxp
 
Image Segmentation: Approaches and Challenges
Apache MXNet
 
Image Segmentation Using Deep Learning : A survey
NUPUR YADAV
 
DefenseTalk_Trimmed
Abhishek Sharma
 
fully convolutional networks for semantic segmentation
XinyangLi16
 
#6 PyData Warsaw: Deep learning for image segmentation
Matthew Opala
 
Online video object segmentation via convolutional trident network
NAVER Engineering
 
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
image_segmentation_ppt.pptx
fgdg12
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
Edge AI and Vision Alliance
 
Image segmentation hj_cho
Hyungjoo Cho
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Universitat Politècnica de Catalunya
 
Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...
Mara Graziani
 
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
PDF
Kiona – A Smart Society Automation Project
IRJET Journal
 
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
PDF
Breast Cancer Detection using Computer Vision
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
Kiona – A Smart Society Automation Project
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Ad

Recently uploaded (20)

PPSX
OOPS Concepts in Python and Exception Handling
Dr. A. B. Shinde
 
PPTX
Mobile database systems 20254545645.pptx
herosh1968
 
PPTX
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
PPTX
Comparison of Flexible and Rigid Pavements in Bangladesh
Arifur Rahman
 
PDF
Designing for Tomorrow – Architecture’s Role in the Sustainability Movement
BIM Services
 
PDF
Decision support system in machine learning models for a face recognition-bas...
TELKOMNIKA JOURNAL
 
PDF
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
AlqualsaDIResearchGr
 
PPTX
CST413 KTU S7 CSE Machine Learning Introduction Parameter Estimation MLE MAP ...
resming1
 
PPTX
FSE_LLM4SE1_A Tool for In-depth Analysis of Code Execution Reasoning of Large...
cl144
 
PDF
Rapid Prototyping for XR: Lecture 3 - Video and Paper Prototyping
Mark Billinghurst
 
PDF
How to Buy Verified CashApp Accounts IN 2025
Buy Verified CashApp Accounts
 
PPTX
Introduction to File Transfer Protocol with commands in FTP
BeulahS2
 
PDF
Rapid Prototyping for XR: Lecture 1 Introduction to Prototyping
Mark Billinghurst
 
PDF
輪読会資料_Miipher and Miipher2 .
NABLAS株式会社
 
PDF
Rapid Prototyping for XR: Lecture 4 - High Level Prototyping.
Mark Billinghurst
 
PPTX
Work at Height training for workers .pptx
cecos12
 
PPTX
MATERIAL SCIENCE LECTURE NOTES FOR DIPLOMA STUDENTS
SAMEER VISHWAKARMA
 
PDF
Validating a Citizen Observatories enabling Platform by completing a Citizen ...
Diego López-de-Ipiña González-de-Artaza
 
PDF
PRIZ Academy - Process functional modelling
PRIZ Guru
 
PDF
13th International Conference of Security, Privacy and Trust Management (SPTM...
ijcisjournal
 
OOPS Concepts in Python and Exception Handling
Dr. A. B. Shinde
 
Mobile database systems 20254545645.pptx
herosh1968
 
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
Comparison of Flexible and Rigid Pavements in Bangladesh
Arifur Rahman
 
Designing for Tomorrow – Architecture’s Role in the Sustainability Movement
BIM Services
 
Decision support system in machine learning models for a face recognition-bas...
TELKOMNIKA JOURNAL
 
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
AlqualsaDIResearchGr
 
CST413 KTU S7 CSE Machine Learning Introduction Parameter Estimation MLE MAP ...
resming1
 
FSE_LLM4SE1_A Tool for In-depth Analysis of Code Execution Reasoning of Large...
cl144
 
Rapid Prototyping for XR: Lecture 3 - Video and Paper Prototyping
Mark Billinghurst
 
How to Buy Verified CashApp Accounts IN 2025
Buy Verified CashApp Accounts
 
Introduction to File Transfer Protocol with commands in FTP
BeulahS2
 
Rapid Prototyping for XR: Lecture 1 Introduction to Prototyping
Mark Billinghurst
 
輪読会資料_Miipher and Miipher2 .
NABLAS株式会社
 
Rapid Prototyping for XR: Lecture 4 - High Level Prototyping.
Mark Billinghurst
 
Work at Height training for workers .pptx
cecos12
 
MATERIAL SCIENCE LECTURE NOTES FOR DIPLOMA STUDENTS
SAMEER VISHWAKARMA
 
Validating a Citizen Observatories enabling Platform by completing a Citizen ...
Diego López-de-Ipiña González-de-Artaza
 
PRIZ Academy - Process functional modelling
PRIZ Guru
 
13th International Conference of Security, Privacy and Trust Management (SPTM...
ijcisjournal
 

IRJET- Semantic Segmentation using Deep Learning

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1215 Semantic Segmentation Using Deep Learning Shubham Singh1 , Sajal Kaushik2 , Rahul Vats3 , Arihant Jain4 , and Narina Thakur5 1 Bharati Vidyapeeth’s College of Engineering, A-4, Paschim Vihar, New Delhi 110063 ----------------------------------------------------------------------***------------------------------------------------------------------------- Abstract— Semantic image segmentation is an essential com- ponent of modern autonomous driving systems, as an accurate understanding of the surrounding scene is crucial to navigation and action planning. Current state-of-the-art approaches in semantic image segmentation rely on pre-trained networks that were initially developed for classifying images as a whole. While these networks exhibit outstanding recognition performance, they lack localization accuracy. Therefore, additional memory intensive units have to be included in order to obtain pixel- accurate segmentation masks at the full image resolution. To alleviate this problem we Implemented various Standard Models such as GCN, DeepLabV3, PSPNet and FC-Densenet on CamVid image frames dataset, try to optimize them and then we proposed a novel FRRN based architecture that exhibits strong localization and recognition performance. We combine multi- scale context with pixel- level accuracy by using four (two as of in FRRN) processing streams within our network: One stream carries information at the full image resolution, enabling precise adherence to segment boundaries. Other streams undergoes asequence of pooling operations to obtain robust features for recognition. The two streams are coupled at the full and half image resolution using residuals. Our approach achieves an intersection-over-union score of 0.87 on the CamVid dataset. I. INTRODUCTION Semantic segmentation is an important aspect of image analysis task and a key problem in Computer vision. It describes the process of associating each pixel of an image with a class label like car, bus, road, pole, etc. Semantic segmentation is widely used in autonomous driving, medi- cal image segmentation, Geo-Sensing, Facial segmentation, Precision Agriculture, Human- Machine interaction, Image search engines and many more. These problems have been solved using traditional Machine Learning and Computer Vision techniques but advancements in Deep learning tech- nology have created lot of space to improve them in terms of accuracy and efficiency. Semantic segmentation is more informative than image classification and object localization. While image classifica- tion tells about the presence of an object in image and object localization locates the objects by making bounding boxes around them before classification, semantic segmentation classify each and every pixel of objects in image. Also, Instance segmentation is similar to semantic segmentation but it also classify different instances of a class within aimage, like two cars in a image. Semantic segmentation not only predicts classes of objects but also tells about spatial location of those classes in image. Further, different instances of same class can also be classified and also components of already segmented classes can also be classified. But this paper focus only on general Fig. 1.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1216 per-pixel classification, i.e., same labels are given to different instances of same class and if they are overlapped their boundary is not distinguished as shown in Fig. 1 (c). While image segmentation groups similar pixels of class together, in Video segmentation disjoint sets of consecutive and homogeneous frames are segmented that exhibit coher- ence in both motion and appearance. Tosegment dynamic scenes of a video in high quality, deep learning models paved way to achieve better performance than the traditional algo- rithms. Video segmentation is useful in activity recognition and other visual enhancements. II. RELATED WORK A. BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation BiseNet performs real time semantic segmentation by taking into account the contextual and spatial features. The spatial path with a small stride preserves the spatial informa- tion and generate high-resolution features. And the Context Path with a fast downsampling strategy is used to get enough receptive field which runs parallelly to the spatialpath. These fusion of two paths results in better accuracy without loss of speed termed Feature Fusion Model (FFM). Also a Attention Refinement Model refines the features of each stage by using global average pooling.[1] B. SegNet: A Deep Convolutional Encoder-Decoder Archi- tecture In this semantic pixel wise segmentation is done termed as SegNet. The architecture consist of a Encoder similar to the convolutional layers in the VGG16 network and a Decoder followed by pixel-wise classification layer. Here Encoder performs convolutions of the given input to get set of features which are normalized in batches. Further ReLu is applied in input data element wise which is pooled by max pooling followed by sub sampling of the result.[2] C. MobileNets for Semantic Segmentation This model is based on depth-wise separable convolutions. It is a type of factorized convolution which factorize a standard convolution into a depth-wise convolution and a 11 convolution called a point-wise convolution. A standard convolution simultaneously filters and combines input into a new set of outputs in a single step, whereas the depth-wise separable convolution does this in two layers, a separate layer for filtering and a separate layer for combining.[3] D. RefineNet: Multi-Path Refinement Networks for High- Resolution Semantic Segmentation It is a generic multi-path refinement network which uses long-range residual connections for high-resolution predic- tions. Here, multiple paths over which information from different resolutions and via potentially long-range connec- tions, is assimilated using a generic building block, termed RefineNet. The deep layers captures high level semantic features thus are refined using fine-grained features resulted from earlier convolutions.[4] E. ICNet for Real-Time Semantic Segmentation Image Cascade Network (IcNet) incorporates multi- resolution branches under proper label guidance. Here, cas- cade image inputs i.e., images with varying resolution are used and cascade feature fusion (CFF) unit is employed. Input image with full resolution is downsampled by factors of four and two, which acts as a cascade input to branches of high and medium resolution. CFF is used for combining cascade features from inputs of various resolution.[5] III. DATASET AND PREPROCESSING The dataset is taken from the Cambridge-driving Labeled Video(CamVid) Database. It is collection of videos with object class semantic labels. The CamVid dataset consists of: the original video sequences, the list of class labels and the hand labeled frames. It provides ten minutes of 30Hz footage with correspond- ing semantically labeled images at 1Hz. Dataset consist of 6 directories: train, train labels, test, test labels, val and val labels. Labelled directories consist of labelled data and other directories consist of actual images without labels.[6]
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1217 A. Scaling This step is done to make sure that images have same size and aspect ratio. After this, we scale the image as per the model. B. Normalization It is used to makes the convergence faster during the time of training the model. It is done by subtracting the mean from each pixel and then dividing the result by the standard deviation as a result distribution will resemble Gaussian curve with mean at zero. The pixel intensity will now lie in range [0,1]. IV. APPROACHES In this paper, we implement U-net, Dilated U-net and PSPnet. A. U-net U-net architecture consist of 2 paths which are basically known to be as encoder and decoder. Encoder is also known as contraction path and decoder is also known as symmetric expanding path. Encoder captures the context in the image. Here, convolu- tion blocks followed by a maxpool downsampling to encode the input image into feature representations at multiple different levels is applied. Hence, it is stack of convolutional and max pooling layers. It is also called downsampling. Decoder consists of upsample and concatenation followed by regular convolution operations. It enable precise localiza- tion using transposed convolutions. Hence it is an end-to-end fully convolutional network. It contains Convolutional layers only and does not contain Dense layer because of which it can accept image of any size. In U-net, pooling layers increase the field of view and are able to aggregate the context while discarding the where information and semantic segmentation requires the exact alignment of class maps and thus, needs the where informa- tion to be preserved. Hence, U-net is preferred here.[7] We have hyper-tweaked the U-net model to improve the IoU metric. B. Dilated U-net In this model, the simple convolutions are replaced by dilated convolutions. Dilated convolutions are also called atrous convolutions. For 1D signal x[i], the y[i] output of a dilated convolution with the dilation rate r and a filter w[s] with size S is formulated as: Let x[i] be a 1D signal, r be the dilation rate and filter w[s] with size S. The output y[i] of the dilated convolutions is: S y[i]= x[i +r.s]w[s] s=1 Dilated convolutions bring translated variant in input as: f (g(x)) = g(f (x)) where g(.) is convolution operation and f(.) is translation operation. This will help in reducing the parameters massively be- cause receptive fields grow aggressively. Along with this, pooling helps in pixel wise classification. But pooling layer decreases the resolution of the input images as a result dilated U-net model does not works well.[8]
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1218 C. PSPnet PSPnet is abbreviation for Pyramid Scene Parsing Net- work. This model propose pyramid pooling module to join the context of the image hence, called PSPnet. Dilated convolutions are used to modify Resnet and a pyramid pooling module is added to it. Pyramid pooling module captures information by applying large kernel pooling layers. This module concatenates the feature maps from ResNet with upsampled output of parallel pooling layers with kernels covering whole, half of and small portions of image. Also an auxiliary loss is applied after the fourth stage of ResNet (i.e input to pyramid pooling module), called as intermediate supervision. The resolution of image is also preserved in PSPnet because it uses large pooling layer.[9] Fig. 2. Input Image - Ground Truth - Predicted Image (PSPNet) Fig. 3. IoU vs Epoch (PSPNet) D. Fully Convolutional DenseNets It is a a CNN with Densely Connected Convolutional Networks, called as DenseNets. It is based upon the fact that if each layer is directly connected to every other layer in a feed-forward fashion then the model will become more Fig. 4. Loss vs Epoch (PSPNet)
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1219 accurate and will be easier and efficient to train. They have various advantages such as they reduce the vanishing- gradient problem, strengthen feature propagation, encourage feature reuse, and reduce the number of parameters. FCNs are built from a downsampling path, an upsam- pling path and skip associations. Skip associations help the upsampling path recover spatially detailed data from the downsampling path, by reusing features maps. The objective of the model is to further exploit the feature reuse and avoiding the feature explosion at the upsampling path of the network.[10] Fig. 5. Input Image - Ground Truth - Predicted Image (FC DenseNet) Fig. 6. IoU vs Epoch (FC DenseNet) Fig. 7. Loss vs Epoch (FC DenseNet)
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1220 E. Global Convolutional Network(GCN) In this, an encoder - decoder architecture with very large kernels convolutions is proposed. Kernel size is increased to spatial size of feature map. Such convolutions are used because fully connected layers are not able to perform semantic segmentation well. Also, large kernels have very large receptive field and model gather information from much smaller area. Large kernels have a lot of parameters and are computationally expensive. So, in order to avoid that convo- lutions are approximated. This approximated convolution is called global convolution. Encoder used is ResNet(without any dilated convolutions). Decoder consist of GCNs and deconvolutions.[11] Fig. 8. Input Image - Ground Truth - Predicted Image (GCN) Fig. 9. IoU vs Epoch (GCN) F. DeeplabV3 DeepLab is Google,s open sourced model of semantic seg- mentation where concept of atrous convolution is introduced which is a generalized form of the convolution operation. It uses atrous convolution with rates 6, 12 and 18. Here, rate is a parameter that controls the effective field of view of the convolution. With inspiration from success of Spatial pyramid pooling, Atrous spatial pyramid pooling was made where four parallel atrous convolutions with different atrous rates, i.e. 1 x 1 convolution and 3 x 3 atrous convolution with rates [6, 12, 18], are applied on top of the feature map, as it is effective to resample features at different scales for accurately and efficiently classifying regions of an arbitrary scale. Bilinear upsampling is used to scale the features to the correct dimensions. In the later version i.e, DeeplabV3+ a Decoder module on top of the regular DeepLabV3 model is added. Here, instead of using bilinear upsampling,encoded features are upsampled and then concatenated with corresponding low level features from the encoder module having same spatial dimension. [12] Fig. 11. Input Image - Ground Truth - Predicted Image (DeepLabV3)
  • 7. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1221 G. Optimized FRRN In this FRRN based Model, we have added more FRRU units which try to capture more of local features at pixel level for better classification of accuracies. MultiScale extracted features are passed in to each of the units that extract the features then they are passed at each of the streams. Our design is motivated by the need to have networks that can jointly compute good high-level features for recognition Fig. 12. IoU vs Epoch (DeepLab v3) Fig. 13. Loss vs Epoch (DeepLab v3) and good low-level features for localization. Regardless of the specific network design, obtaining good high level features requires a sequence of pooling operations. The pooling operations reduce the size of the feature maps and increase the networks receptive field, as well as its robustness against small translations in the image. Fig. 14. Input Image - Ground Truth - Predicted Image (Optimized FRRN)
  • 8. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1222 V. EXPERIMENTAL SETUP AND RESULTS A. Evaluation Criteria And Procedure Intersection over Union(IoU) is used as a evaluation cri- teria. It is similar to Jaccard index. It is used to compare the diversity and similarity of two images or sets. It is defined Fig. 16. Loss vs Epoch (Optimized FRRN) as the size of intersection of two images divided by the size of union of two images. IoU(X,Y) = X ∩ Y X ∪Y where X and Y are two sets or images. B. Experimental Results Approach Evaluation Criteria Precisio n Recall F1 Score IoU PSPNet 0.74 0.74 0.74 0.81 FC-DenseNet 0.74 0.77 0.79 0.79 GCN 0.80 0.84 0.86 0.57 DeepLabV3 0.72 0.63 0.64 0.81 Our approach 0.84 0.82 0.82 0.87 VI. CONCLUSIONS This Paper try to show the results of Various Deep learning Models on the Camvid analyzing various parameters with respect to the increasing the epochs including intersection over union score, validation Score and much more. REFERENCES [1] Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N. (2018). Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European Conference on Computer Vision(ECCV) (pp. 325-341). [2] Badrinarayanan, V., Kendall, A., Cipolla, R. (2017). Segnet: A deep convolutional encoder-decoder architecture for image segmenta- tion. IEEE transactions on pattern analysis and machine intelligence, 39(12), 2481-2495. [3] Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., ... Adam, H. (2017). Mobilenets: Efficient convolu- tional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. [4] Lin, G., Milan, A., Shen, C., Reid, I. (2017). Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1925-1934). [5] Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J. (2018). Icnet for real-time semantic segmentation on high-resolution images. In
  • 9. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1223 Proceedings of the European Conference on Computer Vision (ECCV) (pp. 405-420). [6] Brostow, G. J., Fauqueur, J., Cipolla, R. (2009). Semantic object classes in video: A high-definition ground truth database. Pattern Recognition Letters, 30(2), 88-97. [7] Ronneberger, O., Fischer, P., Brox, T. (2015, October). U-net: Convo- lutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted inter- vention (pp. 234- 241). Springer, Cham. [8] Yu, F., Koltun, V.(2015). Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122. [9] Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J. (2017). Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2881-2890). [10] Jgou, S., Drozdzal, M., Vazquez, D., Romero, A., —& Bengio, Y.(2017). The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 11-19). [11] Peng, C., Zhang, X., Yu, G., Luo, G., —& Sun, J. (2017). Large Kernel Matters–Improve Semantic Segmentation by Global Convolutional Network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4353- 4361). [12] Chen, L. C., Papandreou, G., Schroff, F., —& Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation.arXiv preprint arXiv:1706.05587. [13] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The cityscapes dataset for semantic urban scene understanding. In CVPR, 2016. [14] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision. [15] N. Silberman, D. Hoiem, P. Kohli, R. Fergus, ”Indoor segmentation and support inference from RGBD images”, Proc. 12th Eur. Conf. Comput. Vis. , pp. 746-760, 2012. [16] G. Neuhold, T. Ollmann, S. Rota Bul, and P. Kontschieder. The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes. In International Conference on Computer Vision (ICCV), 2017. [17] Brostow GJ, Fauqueur J, Cipolla R, (2009) Semantic object classes in video: A high-definition ground truth database. Pattern Recognit Lett 30:8897. [18] J. Fu, J. Liu, Y. Wang, and H. Lu. Stacked deconvolutional network for semantic segmentation. arXiv preprint arXiv:1708.04943, 2017 [19] G. Lin, A. Milan, C. Shen, and I. Reid, Refinenet: Multi-path refine- ment networks for high-resolution semantic segmentation, in CVPR, 2017. [20] C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang. Bisenet: Bilateral segmentation network for real-time semantic segmentation. 2018. [21] V. Badrinarayanan, A. Kendall, and R. Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. TPAMI, 2017. [22] Romera, E., Alvarez, J. M., Bergasa, L. M., Arroyo, R. (2018). Erfnet: Efficient residual factorized convnet for real-time semantic segmenta- tion. IEEE Transactions on Intelligent Transportation Systems, 19(1), 263-272. [23] Lin, G., Shen, C., Van Den Hengel, A., —& Reid, I. (2018). Exploring context with deep structured models for semantic segmentation. IEEE transactions on pattern analysis and machine intelligence, 40(6), 1352- 1366. [24] Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., —& Agrawal, A. (2018). Context encoding for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7151-7160). [25] R. P. K. Poudel, U. Bonde, S. Liwicki, and C. Zach. ContextNet: Exploring context and detail for semantic segmentation in real-time. In BMVC, 2018 [26] K. Rakelly, E. Shelhamer, T. Darrell, A. A. Efros, and S. Levine, Few- shot segmentation propagation with guided networks, arXiv preprint arXiv:1806.07373, 2018. [27] S. Jain and J. Gonzalez. Fast semantic segmentation on video us- ing block motion-based feature interpolation. In ECCV International Workshop on Video Segmentation, 2018 [28] Tao Yang, Yan Wu, Junqiao Zhao, and Linting Guan. Semantic segmentation via highly fused convolutional network with multiple soft cost functions. arXiv preprint arXiv:1801.01317, 2018. [29] Gharghabi, S., Yeh, C. C. M., Ding, Y., Ding, W., Hibbing, P., LaMunion, S., ... —& Keogh, E. (2019). Domain agnostic online semantic segmentation for multi-dimensional time series. Data mining and knowledge discovery, 33(1), 96-130. [30] Chiu, H. P., Samarasekera, S., Kumar, R., Villamil, R., Murali, V., —& Kessler, G. D. (2019). U.S. Patent Application No. 16/101,201.
  • 10. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1224 [31] Gharghabi, S., Yeh, C. C. M., Ding, Y., Ding, W., Hibbing, P., LaMunion, S., ... —& Keogh, E. (2019). Correction to: Domain agnostic online semantic segmentation for multi-dimensional time series. Data Mining and Knowledge Discovery, 1- 2. [32] Desai, A. D., Gold, G. E., Hargreaves, B. A., —& Chaudhari, A. S. (2019). Technical Considerations for Semantic Segmenta- tion in MRI using Convolutional Neural Networks. arXiv preprint arXiv:1902.01977. [33] Rakelly, K., Shelhamer, E., Darrell, T., Efros, A., —& Levine, S. (2018). Conditional networks for few-shot semantic segmentation. [34] Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., —& Cottrell, G. (2018, March). Understanding convolution for semantic segmentation. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 1451- 1460). IEEE. [35] Shimoda, M., Sada, Y., —& Nakahara, H. (2019, April). Filter-Wise Pruning Approach to FPGA Implementation of Fully Convolutional Network for Semantic Segmentation. In International Symposium on Applied Reconfigurable Computing (pp. 371-386). Springer, Cham. [36] Li, H., Xiong, P., Fan, H., —& Sun, J. (2019). DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation. arXiv preprint arXiv:1904.02216.