SlideShare a Scribd company logo
Presenter : Aydin Ayanzadeh
Email: Ayanzadeh17@itu.edu.tr
Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018
Semantic segmentation with Convolutional
Neural Network
1
Agenda
- INTRODUCTION - WBS
-PROJECT DESCRIPTION -Gannett chart
-Related work
-Fine-tuning the model
-Experiment Results
-Demo of work
2
Introduction
● Microsoft Common Objects in Context (COCO)
○ 82783 images for training, 40504 for
validation
● VOC 2012,PASCAL-Context, PASCAL,CitySpace
and etc.
○ Instance semantic segmentation,
○ semantic segmentation
[1] http://cocodataset.org/#home 3
Scene Segmentation
4
Importance of Semantic Segmentation
● Autonomous driving
● Medical imaging
Mask-RCNN
● State of the art multi task model for
visual scene understanding:
● object detection
● classification
● instance segmentation
● Highly modular and easy to train
● Extending Faster R-CNN for Pixel
Level Segmentation
● Based on Faster R-CNN + mask
branch, RoIAlign
5
Mask-RCNN
6
● Proposal evaluation based on
Intersection over Union with ground
truth boxes:
● best regions are kept as positive
examples.
● worst (IoU < 0.3) are known as
negatives for training
https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
Comparison of RCNN family
7
Requirement for training ● 8-GPU (so 8 on 8 GPUs)
● train the model for 24k iterations,
● It takes about 2 days of training on a
single 8-GPU (16 days with one GPU!!!)
● Nvidia Tesla M40 GPU (without
additional feature)
● it can be run 3fps on test time
8
Fine-Tuning the model
● truncate the last layer (softmax layer) of the
pre-trained network
● replace it with our new softmax layer that are
relevant to our own problem.
pros and cons of tuned model
Disadvantage: Segmentation accuracy is
less than original model.
Advantage: It is very faster!!
9
https://www.slideshare.net/AndrKarpitenko/practical-deep-learning
Dataset Analyzing
● Mask R-CNN does detection,
classification and instance
segmentation.
● Based on Faster R-CNN + mask
branch, RoIAlign
● State of the art detection and
instance segmentation on MS COCO
and Cityscapes
10
11
SegNet
12
DeepLab
● DeepLab v1
● DeepLab v2
● DeepLab v3
13
Convolution
14
Dilated Convolution
https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d
Atrous Convolution
15
● Small field of view cause accurate localization
● Large field of view cause to context assimilation
DeepLab v3 Architecture
16
17
Experimental Results
Experimental Results
18
Experimental Results
19
● Complicate image
SiMiTSeg
20
Evaluation Metric
● Pixel Accuracy (PA)
● Mean Pixel Accuracy (MPA)
● Mean Intersection over Union (MIoU)
● Frequency Weighted Intersection over Union
(FWIoU)
21
22
Experimental Results
Mask-RCNN or DeepLab?
23
Future work
● Real-time Segmentation
● Face Segmentation
Discussion
● Complicate image
● Quality of image
● Dataset Size
24
Project
Project ImplementationDatasets
Collecting the
datasets
Research among
State-of-Arts
Extend steps of
project
Finding nominal
methods
Building the
proposed approach
Analyzing the
performance of
approaches
Organizing and
categorizing datasets
Organizing and
categorizing datasets
WBS
Preliminary steps of
project
Implement additional
techniques
Milestone of
Extended step of
project
Gantt Chart
Reference
1. G. Eason, B. Noble, and I.N. Sneddon, “On certain integrals of Lipschitz-Hankel type involving products of Bessel functions,” Phil. Trans. Roy. Soc. London, vol. A247, pp. 529-551, April 1955.
(references)
2. Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition.
2015.
3. Badrinarayanan, Vijay, Alex Kendall, and Roberto Cipolla. "Segnet: A deep convolutional encoder-decoder architecture for image segmentation." IEEE transactions on pattern analysis and machine
intelligence 39.12 (2017): 2481-2495.
4. Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical image computing and computer-
assisted intervention. Springer, Cham, 2015.
5. Jégou, Simon, et al. "The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation." Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE
Conference on. IEEE, 2017.
6. Paszke, Adam, et al. "Enet: A deep neural network architecture for real-time semantic segmentation." arXiv preprint arXiv:1606.02147 (2016).
7. Chaurasia, Abhishek, and Eugenio Culurciello. "LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation." arXiv preprint arXiv:1707.03718 (2017).
8. He, Kaiming, et al. "Mask r-cnn." Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2017.
9. Zhao, Hengshuang, et al. "Pyramid scene parsing network." IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 2017.
10. Lin, Guosheng, et al. "Refinenet: Multi-path refinement networks for high-resolution semantic segmentation." IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017
11. Islam, Md Amirul, et al. "Gated feedback refinement network for dense image labeling." 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.
12. Hong, Seunghoon, Hyeonwoo Noh, and Bohyung Han. "Decoupled deep neural network for semi-supervised semantic segmentation." Advances in neural information processing systems. 2015.
13. Souly, Nasim, Concetto Spampinato, and Mubarak Shah. "Semi and weakly supervised semantic segmentation using generative adversarial network." arXiv preprint arXiv:1703.09695 (2017).
27
28
29

More Related Content

PDF
A survey of deep learning approaches to medical applications
PPTX
Introduction to Deep Learning
PDF
Deep learning for medical imaging
PDF
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
PPTX
Image Classification using deep learning
PDF
Convolutional Neural Networks (CNN)
PDF
Single Image Super Resolution Overview
PDF
Generative Adversarial Networks
A survey of deep learning approaches to medical applications
Introduction to Deep Learning
Deep learning for medical imaging
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Classification using deep learning
Convolutional Neural Networks (CNN)
Single Image Super Resolution Overview
Generative Adversarial Networks

What's hot (20)

PPTX
Image classification with Deep Neural Networks
PPTX
Introduction to Grad-CAM (complete version)
PDF
Deep learning - A Visual Introduction
PPTX
Human Pose Estimation by Deep Learning
PDF
An introduction to Deep Learning
PDF
ViT (Vision Transformer) Review [CDM]
PDF
A brief introduction to recent segmentation methods
PDF
(2017/06)Practical points of deep learning for medical imaging
PDF
Introduction to batch normalization
PDF
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PDF
Deep Learning - Convolutional Neural Networks
PDF
Machine learning Algorithms
PPTX
Autoencoders in Deep Learning
PPTX
Introduction to CNN
PDF
Introduction of Deep Learning
PPTX
Transfer Learning and Fine-tuning Deep Neural Networks
PDF
Basic Generative Adversarial Networks
PPTX
Computer vision introduction
PDF
The fundamentals of Machine Learning
PPTX
CONVOLUTIONAL NEURAL NETWORK
Image classification with Deep Neural Networks
Introduction to Grad-CAM (complete version)
Deep learning - A Visual Introduction
Human Pose Estimation by Deep Learning
An introduction to Deep Learning
ViT (Vision Transformer) Review [CDM]
A brief introduction to recent segmentation methods
(2017/06)Practical points of deep learning for medical imaging
Introduction to batch normalization
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
Deep Learning - Convolutional Neural Networks
Machine learning Algorithms
Autoencoders in Deep Learning
Introduction to CNN
Introduction of Deep Learning
Transfer Learning and Fine-tuning Deep Neural Networks
Basic Generative Adversarial Networks
Computer vision introduction
The fundamentals of Machine Learning
CONVOLUTIONAL NEURAL NETWORK
Ad

Similar to Semantic segmentation with Convolutional Neural Network Approaches (20)

PDF
IRJET- Semantic Segmentation using Deep Learning
PPTX
Image Segmentation Using Deep Learning : A survey
PDF
The Future of Health Monitoring: Advances in Wearable Sensor Data Processing
PDF
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
PPTX
AaSeminar_Template.pptx
PPTX
Image Segmentation: Approaches and Challenges
PDF
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
PDF
Deep Neural Networks Presentation
PDF
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
PPTX
DefenseTalk_Trimmed
PDF
IMAGE SEGMENTATION AND ITS TECHNIQUES
PDF
Stadnford University practical presentation.pdf
PPTX
Image segmentation hj_cho
PPTX
Rafiqul islam
PDF
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
PDF
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
PDF
#6 PyData Warsaw: Deep learning for image segmentation
PDF
Residual balanced attention network for real-time traffic scene semantic segm...
PPTX
Review-image-segmentation-by-deep-learning
PPTX
U-Net (1).pptx
IRJET- Semantic Segmentation using Deep Learning
Image Segmentation Using Deep Learning : A survey
The Future of Health Monitoring: Advances in Wearable Sensor Data Processing
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
AaSeminar_Template.pptx
Image Segmentation: Approaches and Challenges
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
Deep Neural Networks Presentation
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
DefenseTalk_Trimmed
IMAGE SEGMENTATION AND ITS TECHNIQUES
Stadnford University practical presentation.pdf
Image segmentation hj_cho
Rafiqul islam
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
#6 PyData Warsaw: Deep learning for image segmentation
Residual balanced attention network for real-time traffic scene semantic segm...
Review-image-segmentation-by-deep-learning
U-Net (1).pptx
Ad

More from UMBC (20)

PDF
LinkedGuard: SafeGuarding LinkedIn Privacy by Identifying Authentic Companies...
PDF
Cell Segmentation of 2D Phase-Contrast Microscopy Images with Deep Learning M...
PPTX
Mreps efficient and flexible detection of tandem repeats in dna
PDF
Deep Learning based Segmentation Pipeline for Label-Free Phase-Contrast Micro...
PDF
Protein family specific models using deep neural networks and transfer learni...
PDF
Spatial information Fuzzy C-mean(SFCM)
PDF
CENTRALITY OF GRAPH ON DIFFERENT NETWORK TOPOLOGIES
PPTX
Fuzzy Clustering(C-means, K-means)
PPTX
A machine learning based protocol for efficient routing in opportunistic netw...
PPTX
Estimating Number of People in ITU-EEB as an Application of People Counting T...
PDF
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
PPTX
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
PDF
Smart city take home question answers
PDF
Possible Application for smart Airports
PDF
udacity Advance Lane identification
PPTX
Kaggle Dog breed Identification
PPTX
udacity Advance Lane identification (progress presentation)
PPTX
Term project proposal image processing project
PPTX
presntation about smart charging for the vehicles
PDF
Report for Smart aiport application
LinkedGuard: SafeGuarding LinkedIn Privacy by Identifying Authentic Companies...
Cell Segmentation of 2D Phase-Contrast Microscopy Images with Deep Learning M...
Mreps efficient and flexible detection of tandem repeats in dna
Deep Learning based Segmentation Pipeline for Label-Free Phase-Contrast Micro...
Protein family specific models using deep neural networks and transfer learni...
Spatial information Fuzzy C-mean(SFCM)
CENTRALITY OF GRAPH ON DIFFERENT NETWORK TOPOLOGIES
Fuzzy Clustering(C-means, K-means)
A machine learning based protocol for efficient routing in opportunistic netw...
Estimating Number of People in ITU-EEB as an Application of People Counting T...
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
Smart city take home question answers
Possible Application for smart Airports
udacity Advance Lane identification
Kaggle Dog breed Identification
udacity Advance Lane identification (progress presentation)
Term project proposal image processing project
presntation about smart charging for the vehicles
Report for Smart aiport application

Recently uploaded (20)

PPTX
Simulation of electric circuit laws using tinkercad.pptx
PDF
Monitoring Global Terrestrial Surface Water Height using Remote Sensing - ARS...
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
AgentX UiPath Community Webinar series - Delhi
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Internship_Presentation_Final engineering.pptx
PDF
Queuing formulas to evaluate throughputs and servers
PPTX
MET 305 MODULE 1 KTU 2019 SCHEME 25.pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPT
Drone Technology Electronics components_1
PPTX
Road Safety tips for School Kids by a k maurya.pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Geodesy 1.pptx...............................................
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
“Next-Gen AI: Trends Reshaping Our World”
PPTX
Practice Questions on recent development part 1.pptx
Simulation of electric circuit laws using tinkercad.pptx
Monitoring Global Terrestrial Surface Water Height using Remote Sensing - ARS...
Operating System & Kernel Study Guide-1 - converted.pdf
AgentX UiPath Community Webinar series - Delhi
bas. eng. economics group 4 presentation 1.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Internship_Presentation_Final engineering.pptx
Queuing formulas to evaluate throughputs and servers
MET 305 MODULE 1 KTU 2019 SCHEME 25.pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Drone Technology Electronics components_1
Road Safety tips for School Kids by a k maurya.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
Geodesy 1.pptx...............................................
OOP with Java - Java Introduction (Basics)
“Next-Gen AI: Trends Reshaping Our World”
Practice Questions on recent development part 1.pptx

Semantic segmentation with Convolutional Neural Network Approaches

  • 1. Presenter : Aydin Ayanzadeh Email: [email protected] Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018 Semantic segmentation with Convolutional Neural Network 1
  • 2. Agenda - INTRODUCTION - WBS -PROJECT DESCRIPTION -Gannett chart -Related work -Fine-tuning the model -Experiment Results -Demo of work 2
  • 3. Introduction ● Microsoft Common Objects in Context (COCO) ○ 82783 images for training, 40504 for validation ● VOC 2012,PASCAL-Context, PASCAL,CitySpace and etc. ○ Instance semantic segmentation, ○ semantic segmentation [1] http://cocodataset.org/#home 3
  • 4. Scene Segmentation 4 Importance of Semantic Segmentation ● Autonomous driving ● Medical imaging
  • 5. Mask-RCNN ● State of the art multi task model for visual scene understanding: ● object detection ● classification ● instance segmentation ● Highly modular and easy to train ● Extending Faster R-CNN for Pixel Level Segmentation ● Based on Faster R-CNN + mask branch, RoIAlign 5
  • 6. Mask-RCNN 6 ● Proposal evaluation based on Intersection over Union with ground truth boxes: ● best regions are kept as positive examples. ● worst (IoU < 0.3) are known as negatives for training https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
  • 8. Requirement for training ● 8-GPU (so 8 on 8 GPUs) ● train the model for 24k iterations, ● It takes about 2 days of training on a single 8-GPU (16 days with one GPU!!!) ● Nvidia Tesla M40 GPU (without additional feature) ● it can be run 3fps on test time 8
  • 9. Fine-Tuning the model ● truncate the last layer (softmax layer) of the pre-trained network ● replace it with our new softmax layer that are relevant to our own problem. pros and cons of tuned model Disadvantage: Segmentation accuracy is less than original model. Advantage: It is very faster!! 9 https://www.slideshare.net/AndrKarpitenko/practical-deep-learning
  • 10. Dataset Analyzing ● Mask R-CNN does detection, classification and instance segmentation. ● Based on Faster R-CNN + mask branch, RoIAlign ● State of the art detection and instance segmentation on MS COCO and Cityscapes 10
  • 11. 11
  • 13. DeepLab ● DeepLab v1 ● DeepLab v2 ● DeepLab v3 13
  • 15. Atrous Convolution 15 ● Small field of view cause accurate localization ● Large field of view cause to context assimilation
  • 21. Evaluation Metric ● Pixel Accuracy (PA) ● Mean Pixel Accuracy (MPA) ● Mean Intersection over Union (MIoU) ● Frequency Weighted Intersection over Union (FWIoU) 21
  • 24. Future work ● Real-time Segmentation ● Face Segmentation Discussion ● Complicate image ● Quality of image ● Dataset Size 24
  • 25. Project Project ImplementationDatasets Collecting the datasets Research among State-of-Arts Extend steps of project Finding nominal methods Building the proposed approach Analyzing the performance of approaches Organizing and categorizing datasets Organizing and categorizing datasets WBS Preliminary steps of project Implement additional techniques Milestone of Extended step of project
  • 27. Reference 1. G. Eason, B. Noble, and I.N. Sneddon, “On certain integrals of Lipschitz-Hankel type involving products of Bessel functions,” Phil. Trans. Roy. Soc. London, vol. A247, pp. 529-551, April 1955. (references) 2. Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. 3. Badrinarayanan, Vijay, Alex Kendall, and Roberto Cipolla. "Segnet: A deep convolutional encoder-decoder architecture for image segmentation." IEEE transactions on pattern analysis and machine intelligence 39.12 (2017): 2481-2495. 4. Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical image computing and computer- assisted intervention. Springer, Cham, 2015. 5. Jégou, Simon, et al. "The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation." Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on. IEEE, 2017. 6. Paszke, Adam, et al. "Enet: A deep neural network architecture for real-time semantic segmentation." arXiv preprint arXiv:1606.02147 (2016). 7. Chaurasia, Abhishek, and Eugenio Culurciello. "LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation." arXiv preprint arXiv:1707.03718 (2017). 8. He, Kaiming, et al. "Mask r-cnn." Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2017. 9. Zhao, Hengshuang, et al. "Pyramid scene parsing network." IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 2017. 10. Lin, Guosheng, et al. "Refinenet: Multi-path refinement networks for high-resolution semantic segmentation." IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017 11. Islam, Md Amirul, et al. "Gated feedback refinement network for dense image labeling." 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017. 12. Hong, Seunghoon, Hyeonwoo Noh, and Bohyung Han. "Decoupled deep neural network for semi-supervised semantic segmentation." Advances in neural information processing systems. 2015. 13. Souly, Nasim, Concetto Spampinato, and Mubarak Shah. "Semi and weakly supervised semantic segmentation using generative adversarial network." arXiv preprint arXiv:1703.09695 (2017). 27
  • 28. 28
  • 29. 29

Editor's Notes

  • #4: Microsoft Common Objects in Context (COCO) [31]6 : is another image recognition, segmentation, and captioning large-scale dataset. It features various challenges, being the detection one the most relevant for this field since one of its parts is focused on segmentation. That challenge, which features more than 80 classes, provides more than 82783 images for training, 40504 for validation, and its test set consist of more than 80000 images. In particular, the test set is divided into four different subsets or splits: test-dev (20000 images) for additional validation, debugging, test-standard (20000 images) is the default test data for the competition and the one used to compare state-of-the-art methods, testchallenge (20000 images) is the split used for the challenge when submitting to the evaluation server, and test-reserve (20000 images) is a split used to protect against possible overfitting in the challenge (if a method is suspected to have made too many submissions or trained on the test data, its results will be compared with the reserve split). Its popularity and importance has ramped up since its appearance thanks to its large scale. In fact, the results of the challenge are presented yearly on a joint workshop at the European Conference on Computer Vision (ECCV)7 together with ImageNet’s ones
  • #5: Instance segmentation is challenging because it requires the correct detection of all objects in an image while also precisely segmenting each instance. It therefore combines elements from the classical computer vision tasks of ob- ject detection , where the goal is to classify individual ob- jects and localize each using a bounding box, and semantic RoIAlign RoIAlign class box conv conv conv conv Figure 1. The Mask R-CNN framework for instance segmentation. segmentation , where the goal is to classify each pixel into a fixed set of categories without differentiating object in- stances
  • #6: Mask R-CNN is conceptually simple: Faster R-CNN has two outputs for each candidate object, a class label and a bounding-box offset; to this we add a third branch that out- puts the object mask. Mask R-CNN is thus a natural and in- tuitive idea. But the additional mask output is distinct from the class and box outputs, requiring extraction of much Finer spatial layout of an object RoIPool [12] is a standard operation for extract- ing a small feature map ( e.g ., 7 × 7) from each RoI. RoIPool first quantizes a floating-number RoI to the discrete granu- larity of the feature map,.
  • #7: Mask R-CNN also outputs a binary mask for each RoI. This is in contrast to most recent systems, where clas- Sification dependson mask predictions
  • #8: aster R-CNN consists of two stages. The first stage, called a Region Proposal Network (RPN), proposes candidate object bounding boxes.
  • #11: Convolution layers retain the spatial oreintation and such information
  • #14: Speed: Artous convolution in the deepLab v2 and applying the dense deep convolution neural net- work reduce the performance of video run-time from 0.5 second per frame in DeepLabv1 to 8fps in deepLabv2. Accuracy: due to extracting the dense feature extrac- tion in DeepLab v2, this model has higher accuracy in the detection and segmentation tasks. Simplicity: applying the atrous convolution and other novel techniques that are utilized in this mod- els made it more simple
  • #18: 1464 and 1449 images