Object detection with deep learning

Object Detection with deep learning
SUBMITTED TO: MR. B. SURESH SUBMITTED BY:
HIMANSHU MAURYA(9917102004)
SUSHANT SHRIVASTAVA(9917102023)
BHUVNESH KUMAR BHARDWAJ(9917102028)

1. INTRODUCTION TO
OBJECT DETECTION
○ Object detection is scanning and searching for an object in an image or
a video.
Fig. 1 Object detection

Literature Review.
• Object detection is a common term for computer vision techniques classifying and
locating objects in an image. Modern object detection is largely based on use of
convolutional neural networks Some of the most relevant system types today are Faster
R-CNN, R-FCN, Multibox Single Shot Detector (SSD) and YOLO (You Only Look
Once) [1]. Original R-CNN method worked by running a neural net classifier on samples
cropped from images using externally computed box proposals (=samples cropped with
externally computed box proposals; feature extraction done on all the cropped samples).
This approach was computationally expensive due to many crops.
• Single Shot Multibox Detector (SSD) differs from the R-CNN based approaches by not
requiring a second stage per-proposal classification operation. This makes it fast enough
for real-time detection applications. However, this comes with a price of reduced
precision . “SSD with MobileNet” refers to a model where model meta architecture is
SSD and the feature extractor type is MobileNet.

2. Generic object detection
● Generic object detection aims at locating and classifying
existing object in any one image and labelling them with
rectangular BBs to show the confidences of existences.
Fig. 2 Generic object detection

3. Basic architecture of CNN
Convolutional Neural Network (CNN) is a Deep Learning
algorithm which can take in an input image, assign importance to
various aspects/objects in the image and be able to differentiate
one from the other.[2]
Fig. 3 Basic architecture of CNN

4. Building the CNN
● Convolution
● Polling
● Flattening

4.1 Convolution
● Convolution preserves the spatial relationship between pixels
by learning image features using small squares of input data.
FIG. 4.1 Convolution

4.2 POOLING
● It reduces the dimensionality of each feature map but retains
the most important information.
FIG. 4.2 POOLING

4.3 FLATTENING
● Here the matrix is converted into a linear array so that to input
it into the nodes of our neural network.
FIG. 4.3 FLATTENING

5. Dataset & Preprocessing
COCO stands for Common Objects in Context, this dataset contains around 330K labelled images. COCO is
a large-scale object detection, segmentation, and captioning dataset.[3]
5.1 Features of dataset
· Object segmentation
· Recognition in context
· 330K images (>200K labeled)
· 1.5 million object instances
· 80 object categories
· 91 stuff categories
5.2 Data Preprocessing
● Since the model is pre trained, there is no need for data Preprocessing.

6. What is SSD?
● SSD(Single Shot Detector) is a is designed for object
detection in real-time.
FIG 5. Single Shot Detector.

7. Object detection using SSD algorithm.
● It is a three steps Process:
1. Region Proposal
2. Feature Generation
3. Classification
FIG. 6 Object detection using SSD

8. SSD FRAMEWORK
● Multi-scale feature maps for detection.
● Convolutional predictors for detection.
● Default boxes and aspect ratios.
FIG. 7 SSD FRAMEWORK

9. Feature extraction
● In this stage ,each region proposal is warped or cropped into
a fixed resolution and the SSD module is utilized to extract
features.
FIG. 8 Feature extraction

10. Classification and Localization
● Classify each region using MobileNet V1 Architecture for each
category by passing feature vector created from feature extraction
and scored region are then adjusted with bounding box regression.
● This architecture uses depthwise separable convolutions which
significantly reduces the number of parameters when compared to
the network with normal convolutions.
FIG. 9 Depth Wise Separable
Convolution

11. MobileNet V1 Architecture
● It uses Separable Convolution to reduce the model size and
complexity.
● Smaller model size: Fewer number of parameters.
● Smaller complexity: Fewer Multiplications and Additions
(Multi-Adds).
Fig. 10 MobileNet V1 Architecture

12. Advantages of MobileNet V1 Architecture
● The main advantages is their accuracy in image recognition
problem.
● It takes less time.
● Improve the quality of candidate bounding boxes.

13. Tools And Libraries
● Anaconda — Anaconda is a free and open source distribution of the Python and R programming languages
for data science and machine learning related applications.
● Spyder — Spyder is an open source cross-platform IDE for scientific programming in the Python language.
● Tensorflow — TensorFlow is an open-source software library for dataflow programming across a range of
tasks.
● NumPy- NumPy is a Python package which stands for ‘Numerical Python’. It is the core library for scientific
computing, which contains a powerful n-dimensional array object, provide tools for integrating C, C++ etc.
● Matplotlib- Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of
hardcopy formats and interactive environments across platforms.
● Urllib - Urllib is a Python module that can be used for opening URLs. It defines functions and classes to help in URL
actions. With Python you can also access and retrieve data from the internet like XML, HTML, JSON, etc.

References
1. Zhong-Qiu Zhao , Member, IEEE, Peng Zheng, Shou-Tao Xu, and Xindong Wu , Fellow, IEEE(2016)
2. https://medium.com/@RaghavPrabhu/understanding-of-convolutional-neural-network-cnn-deep-learning-
99760835f148
3. http://cocodataset.org/#home
LINKS TO FIGURES:-
1.
2. https://towardsdatascience.com/going-deep-into-object-detection-bed442d92b34
3. https://medium.com/datadriveninvestor/convolutional-neural-network-cnn-simplified-ecafd4ee52c5
4. https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53
5. https://www.researchgate.net/figure/The-architecture-of-Single-Shot-Multibox-Detector-SSD-It-considers- only-two-
stage-by_fig9_327491507
6. Wei Liu1, Dragomir Anguelov2, Dumitru Erhan3, Christian Szegedy3, Scott Reed4, Cheng-Yang Fu1, Alexander C.
Berg1(2016)
7. Sermanet,P.,Eigen,D.,Zhang,X.,Mathieu,M.,Fergus,R.,LeCun,Y.: Overfeat:Integrated recognition, localization and
detection using convolutional networks. In: ICLR. (2014)
8. https://towardsdatascience.com/cnn-application-on-structured-data-automated-feature-extraction-8f2cd28d9a7e
9. https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-
3bd2b1164a53
10.https://medium.com/@RaghavPrabhu/understanding-of-convolutional-neural-network-cnn-deep-learning-
99760835f148
https://machinethink.net/blog/object-detection/

Object detection with deep learning

More Related Content

What's hot (20)

Similar to Object detection with deep learning (20)

Recently uploaded (20)

Object detection with deep learning