Improving region based CNN object detector using bayesian optimization

Improving Region based CNN object detector
using Bayesian Optimization
AMGAD MUHAMMAD

Agenda
• Background
• Problem definition
• Proposed solution
• Baseline with an example

Background: Deformable Parts Model
• Strong low-level features based on
histograms of oriented gradients (HOG)
• Efficient matching algorithms for deformable part-
based models (pictorial structures)
• Discriminative learning with latent variables (latent
SVM)
• Where to look? Every where (the sliding window
approach)
• mean Average Precision (mAP): 33.7% - 33.4%
P.F. Felzenszwalb et al., “Object Detection with Discriminatively Trained Part-Based Models”, PAMI 2010.
J.J. Lim et al., “Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection”, CVPR 2013.
X. Ren et al., “Histograms of Sparse Codes for Object Detection”, CVPR 2013.

Background: Selective search
• Alternative to exhaustive search
with sliding window.
• Starting with over-segmentation,
merge similar regions and produce region
proposals.
van de Sande et al., “Segmentation as Selective Search for Object Recognition”, ICCV 2011.

Deep Learning happened, again!
Krizhevsky et al., “ImageNet Classification with Deep Convolutional Neural Networks”, NIPS 2012.
ImageNet 2012 :whole-image classification with 1000 categories
Model Top-1(val) Top-5(val) Top-5(test)
1 CNN 40.7% 18.2% -
5 CNNs 38.1% 16.4% 16.4%
1 CNN (pre-trained) 39.0% 16.6% -
7 CNNs (pre-trained) 36.7% 15.4% 15.3%
• Can it be used in object recognition?
• Problems:
• localization: Where is the object?
• annotation: Labeled data is scarce.
• Expensive Computation for dense
search.

R-CNN: Region proposals + CNN
localization featureextraction classification
Approach Summery selective search deep learning
CNN
binary linear SVM

R-CNN
Input image
Girshick et al. CVPR14.

Regions of Interest (RoI)
from a proposal method
(~2k)
Input image
R-CNN

Warped image regions
(~2k)
Input image
R-CNN

ConvNet
ConvNet
ConvNet
Forward each region
through ConvNet
(~2k)
Input image
R-CNN

ConvNet
ConvNet
ConvNet
SVMs
SVMs
SVMs
Forward each region
through ConvNet
Classify regions withSVMs
(~2k)
Input image
R-CNN

ConvNet
ConvNet
ConvNet
SVMs
Forward each region
through ConvNet
Bbox reg
Bbox reg
Bbox reg SVMs
SVMs
Apply boundingboxregressors
Classify regions withSVMs
(~2k)
Input image
R-CNN

• Ad hoc training objectives
• Fine-tune network with softmax classifier (log loss)
• Train post-hoc linear SVMs (hingeloss)
• Train post-hoc bounding-box regressors (squaredloss)
What’s wrong with R-CNN?

• FineHtunenetwork with softmax classifier (log loss)
• Train postHhoclinear SVMs (hingeloss)
• Train postHhocboundingHbox regressors (squaredloss)
• Training is slow (84h), takes a lot of disk space

• FineHtune network with softmax classifier (log loss)
• Train postHhoclinear SVMs (hingeloss)
• Train postHhocboundingHboxregressions (least squares)
• Inference (detection) is slow
• 47s / image with VGG16 [Simonyan & Zisserman. ICLR15]
• Fixed by SPP-net[He et al. ECCV14]
~2000 ConvNet forward passes per image

SPP-net
Input image
He et al. ECCV14.

ConvNet
Input image
“conv5” feature map of image
Forward whole image through ConvNet
SPP-net
He et al. ECCV14.

ConvNet
Input image
“conv5” feature map of imageRegions of
Interest (RoIs)
from a proposal
method
SPP-net
He et al. ECCV14.

ConvNet
Input image
Interest (RoIs)
from a proposal
method
Spatial Pyramid Pooling (SPP) layer
SPP-net
He et al. ECCV14.

Input image
Regions of
Interest (RoIs)
from a proposal
method
ConvNet
SVMs Classify regions withSVMs
FullyHconnected layers
FCs
SPP-net
He et al. ECCV14.

Input image
Regions of
Interest (RoIs)
from a proposal
method
ConvNet
SVMs Classify regions withSVMs
FullyHconnected layers
FCs
Bbox reg
Apply boundingbox regressorsSPP-net
He et al. ECCV14.

What’s good about SPP-net?
• Fixes one issue with R-CNN:makes testing fast
ConvNet
SVMs
FCs
Bbox reg
Region-wise
computation
Image-wise
computation
(shared)

What’s wrong with SPP-net?
• Inherits the rest of R-CNN’sproblems
• Ad hoc trainingobjectives
• Introduces a new problem: cannot update
parameters below SPP layer during training

SPP-net: the main limitation
ConvNet
He et al. ECCV14.
SVMs
Trainable
(3 layers)
Frozen
(13 layers)
FCs
Bbox reg
SPPisnotdifferentiable

Fast R-CNN
• Fast test-time,like SPP-net

Fast R-CNN
• One network, trained in one stage

Fast R-CNN
• Higher mean average precision than R-CNN and SPP-net

Fast R-CNN (test time)
ConvNet
Interest (RoIs)
from a proposal
method
Input image

ConvNet
“RoI Pooling” (singleHlevel SPP) layer
Input image
Regions of
Interest (RoIs)
from a proposal
method

Linear +
softmax
FCs FullyHconnected layers
“RoI Pooling” (singleHlevel SPP) layer
Input image
Softmax classifier
Regions of
Interest (RoIs)
from a proposal
method
ConvNet

ConvNet
“RoI Pooling” (single-level SPP) layer
Linear +
softmax
Softmax classifier
Regions of
Interest (RoIs)
from a proposal
method
Linear
Input image
Bounding-box regressors

Fast R-CNN (training)
Linear +
softmax
FCs
Linear
ConvNet

Log loss + smooth L1 loss
Linear +
softmax
FCs
Linear
ConvNet
Multi-taskloss

Log loss + smooth L1 loss
Linear +
softmax
FCs
Linear
Trainable
Multi-taskloss
ConvNet

What is missing from the previous
architectures?
• All the previous architectures relies on an external region
proposal algorithm.
• Proposed regions are independent from the network loss.
• No control over the regions quality.

• Fast test-time,like FastR-CNN
Faster R-CNN

Faster R-CNN

• Higher mean average precision than R-CNN,SPP-net,
Fast-RCNN
Faster R-CNN

• Higher mean average precision than R-CNN , SPP-
net, Fast-RCNN
• HaveadedicatedRegionProposalNetwork(RPN)trainedto
optimizethenetworkloss.
Faster R-CNN

ConvNet
Input image
Faster R-CNN

ConvNet
Input image
Forward whole
image through
RPN ConNet
Faster R-CNN
ConvNet

ConvNet
Input image
Linear +
softmax Linear
Faster R-CNN
Forward whole
image through
RPN ConNet
ConvNet

ConvNet
Input image
Linear +
softmax
Softmax classifier
Linear
Faster R-CNN
Forward whole
image through
RPN ConNet
ConvNet

ConvNet
Input image
Linear +
softmax
Softmax classifier
Linear Bounding-box regressors
Linear +
softmax
Softmax classifier
Linear
Faster R-CNN
Forward whole
image through
RPN ConNet
ConvNet

ConvNet
Linear +
softmax
FCs
Linear
Linear +
softmax Linear
Faster R-CNN
Trainable
ConvNet
Super efficient: shared
weightsbetween detection
andRegion Proposal network
Trainable

Problem definition
• All region based CNN object detector are dependent on the quality of
the region proposal algorithm.
• Although in the Faster R-CNN, the region proposal network was trained
to minimize a multi-task loss function (log-loss and bounding-box
regression), still ,in my experiments, the best proposed regions are ill-
localized.

Problem definition (example)
Top 1 region

Top 1 region Top 3 regions

Top 5 regions

Top 5 regions Top 100 regions

Better regions with Bayesian
Optimization
Now the goal becomes sampling new solution 𝑦 𝑛+1 with
high chance that it will maximizes the value of 𝑓𝑛+1

Optimization
Given the ability to query a our CNN for region scores
we can repeat the following:

1. Given existing regions/scores •
Optimization

2. Wefit a model
Optimization

2. Wefit a model
3. Introduce the chanceutility function
Optimization

2. Wefit a model
4. Locatethe maximum of the utility
Optimization

2. Wefit a model
5. Observe the new regionscore
Optimization

2. Wefit a model
6. Update the model.
Optimization

2. Wefit a model
6. Update the model.
7. Repeatstep 2.
Optimization

Example of BO applied
to R-CNN
Yuting Zhang, Kihyuk Sohn, Ruben Villegas, Gang Pan, and Honglak Lee.

Originalimage
Yuting Zhang, Kihyuk Sohn, Ruben Villegas, Gang Pan, and Honglak Lee.

Initial detection(localoptima)

Initialdetection&Groundtruth
Neither gives
good
localization

Iter1:Boxesinsidethelocalsearchregion

Iter1:Heat mapofexpectedimprovement(EI)
• A box has 4Ncoordinates:
(centerX, centerY, height,width)
• The height and widthare marginN
alized by max to visualize EI in2D

Iter1:Heat mapofexpectedimprovement(EI)

Iter1:Maximum ofEI–thenewlyproposedbox

Iteration 2: local optimum &searchregion

Iteration2:EIheat map&newproposal

Iteration2:Newlyproposedbox& itsactual score

Iteration 3: local optimum &searchregion

Iteration3:EIheatmap & newproposal

Iteration3:Newlyproposedbox& itsactual score

Improving region based CNN object detector using bayesian optimization

More Related Content

What's hot (20)

Similar to Improving region based CNN object detector using bayesian optimization (20)

More from Amgad Muhammad (6)

Recently uploaded (20)

Improving region based CNN object detector using bayesian optimization