Deep learning based object detection basics

Deep Learning based
Object Detection Basics

Detection As Classification
CAT? NO
DOG? NO

CAT? YES
DOG? NO

CAT? NO
DOG? YES

From Classification To Detection
Classification Head:
● C+1 Scores for C
classes + 1
background
class
Localization Head:
● Class agnostic:
(x,y,w,h)
● Class specific:
(x,y,w,h) X C

● Training
○ Crop random regions from images.
○ Scale to uniform size.
○ A region is labeled according to overlap with ground truth labeling.
○ Optimize using Stochastic Gradient Descent.
○ Handle class imbalance by resampling.
● Detection
○ Use sliding window to go over image.
○ Crop regions.
○ Scale to uniform size.
○ Apply network to all cropped images.
○ Repeat process for different image scales.

How To Handle So Many Detections?
● Problem:
○ Running this algorithm at many locations at many scales result with many detections.
● Solution:
○ Need somehow to suppress weaker detections.

Non-Maximum Suppression (NMS)
● Start with most confident detection D.
● Measure IoU with all other detections.
● Remove detections with IoU>50% with D.
● Repeat with next most confident detection.

● Problem:
○ Previous method was too slow.
○ Network is applied over and over.
● Solution:
○ Sliding window is inherently efficient in the case of CNNs.
● OverFeat: Integrated Recognition, Localization and Detection using
Convolutional Networks (2013)
○ Rob Fergus, Yann LeCun

From Detection To Classification

CNNs Are Still Too Slow
● Problem:
○ Need to test many positions and scales, and use a computationally demanding classifier (CNN)
● Solution:
○ Only look at a tiny subset of possible positions.
● Rich feature hierarchies for accurate object detection and semantic
segmentation (2014)
○ AKA R-CNN
○ Ross Girshick

Region Proposals
● Find “blobby” image regions that are likely to contain objects
● “Class-agnostic” object detector
● Look for “blob-like” regions

Region Proposals: Selective Search

Region Proposals: Many Other Choices

R-CNN: Training
1. Train a classification model on a large dataset (ImageNet)
2. Fine-tune model for detection on a smaller dataset (Pascal)
○ Instead of 1000 ImageNet classes, now use 20 classes + background class.
○ Extract region proposals for all images.
○ Use positive / negative regions from detection images.
■ If proposal has >50% IoU with any ground truth → Positive example.
■ Otherwise → Negative example.
■ Batch = 32 positives + 96 negatives.
3. Train final classifiers
○ Extract region proposals for all images.
○ For each region: crop and warp to CNN size, run forward pass, save features to disk.
(Requires ~200GB for Pascal dataset)
○ Train one binary SVM per class to classify region features.
○ Train one linear regression model per class to predict regression offsets.

R-CNN: 2014’s State Of The Art

Looking for brilliant researchers
cv@brodmann17.com

Deep learning based object detection basics

More Related Content

What's hot (20)

Similar to Deep learning based object detection basics (20)

More from Brodmann17 (6)

Recently uploaded (20)

Deep learning based object detection basics