Structured regression for efficient object detection

Structured Regression for
Eﬃcient Object Detection

Christoph Lampert
www.christoph-lampert.org

Max Planck Institute for Biological Cybernetics, Tübingen

December 3rd, 2009

• [C.L., Matthew B. Blaschko, Thomas Hofmann. CVPR 2008]
• [Matthew B. Blaschko, C.L. ECCV 2008]
• [C.L., Matthew B. Blaschko, Thomas Hofmann. PAMI 2009]

Category-Level Object Localization


What objects are present? person, car


Where are the objects?

Object Localization ⇒ Scene Interpretation

A man inside of a car A man outside of a car
⇒ He’s driving. ⇒ He’s passing by.

Algorithmic Approach: Sliding Window

f (y1 ) = 0.2 f (y2 ) = 0.8 f (y3 ) = 1.5

Use a (pre-trained) classiﬁer function f :
• Place candidate window on the image.
• Iterate:
Evaluate f and store result.
Shift candidate window by k pixels.
• Return position where f was largest.

Algorithmic approach: Sliding Window

f (y1 ) = 0.2 f (y2 ) = 0.8 f (y3 ) = 1.5

Drawbacks:
• single scale, single aspect ratio
→ repeat with diﬀerent window sizes/shapes
• search on grid
→ speed–accuracy tradeoﬀ
• computationally expensive

New view: Generalized Sliding Window

Assumptions:
• Objects are rectangular image regions of arbitrary size.
• The score of f is largest at the correct object position.

Mathematical Formulation:

yopt = argmax f (y)
y∈Y

with Y = {all rectangular regions in image}


Mathematical Formulation:

yopt = argmax f (y)
y∈Y


• How to choose/construct/learn the function f ?
• How to do the optimization eﬃciently and robustly?
(exhaustive search is too slow, O(w2 h2 ) elements).


Use the problem’s geometric structure:


Use the problem’s geometric structure:
• Calculate scores for
sets of boxes jointly.

• If no element can
contain the maximum,
discard the box set.

• Otherwise, split the
box set and iterate.

→ Branch-and-bound
optimization
• ﬁnds global maximum yopt

Representing Sets of Boxes

• Boxes: [l, t, r, b] ∈ R4 .


• Boxes: [l, t, r, b] ∈ R4 . Boxsets: [L, T, R, B] ∈ (R2 )4



Splitting:
• Identify largest interval.



Splitting:
• Identify largest interval. Split at center: R → R1 ∪R2 .



Splitting:
• New box sets: [L, T, R1 , B]



Splitting:
• New box sets: [L, T, R1 , B] and [L, T, R2 , B].

Calculating Scores for Box Sets

Example: Linear Support-Vector-Machine f (y) := pi ∈y wi .

+

f upper (Y) = min(0, wi ) + max(0, wi )
pi ∈y∩ pi ∈y∪

Can be computed in O(1) using integral images.

Calculating Scores for Box Sets
J y
Histogram Intersection Similarity: f (y) := j=1 min(hj , hj ).

J ∪
y
f upper (Y) = j=1
min(hj , hj )

As fast as for a single box: O(J) with integral histograms.

Evaluation: Speed (on PASCAL VOC 2006)

Sliding Window Runtime:
• always: O(w2 h2 )
Branch-and-Bound (ESS) Runtime:
• worst-case: O(w2 h2 )
• empirical: not more than O(wh)

Extensions:

Action classiﬁcation: (y, t)opt = argmax(y,t)∈Y×T fx (y, t)

• J. Yuan: Discriminative 3D Subvolume Search for Eﬃcient Action Detection, CVPR 2009.

Extensions:

Localized image retrieval: (x, y)opt = argmaxy∈Y, x∈D fx (y)

• C.L.: Detecting Objects in Large Image Collections and Videos by Eﬃcient Subimage Retrieval, ICCV 2009

Extensions:

Hybrid – Branch-and-Bound with Implicit Shape Model

• A. Lehmann, B. Leibe, L. van Gool: Feature-Centric Eﬃcient Subwindow Search, ICCV 2009

Structured regression for efficient object detection

Generalized Sliding Window

yopt = argmax f (y)
y∈Y


• How to choose/construct/learn f ?
• How to do the optimization eﬃciently and robustly?

Traditional Approach: Binary Classifier

Training images:
+ +
• x1 , . . . , xn show the object
− −
• x1 , . . . , xm show something else

Train a classifier, e.g.
• support vector machine,
• boosted cascade,
• artificial neural network,. . .

Decision function f : {images} → R
• f > 0 means “image shows the object.”
• f < 0 means “image does not show
the object.”

Traditional Approach: Binary Classiﬁer

Drawbacks:

• Train distribution
= test distribution

• No control over partial
detections.

• No guarantee to even ﬁnd
training examples again.

Object Localization as Structured Output Regression

Ideal setup:
• function
g : {all images} → {all boxes}
to predict object boxes from images
• train and test in the same way, end-to-end

 

gcar   =

Object Localization as Structured Output Regression

Ideal setup:
• function
g : {all images} → {all boxes}
to predict object boxes from images
• train and test in the same way, end-to-end

Regression problem:
• training examples (x1 , y1 ), . . . , (xn , yn ) ∈ X × Y
xi are images, yi are bounding boxes
• Learn a mapping
g : X →Y
that generalizes from the given examples:
g(xi ) ≈ yi , for i = 1, . . . , n,

Structured Support Vector Machine

SVM-like framework by Tsochantaridis et al.:
• Positive deﬁnite kernel k : (X × Y) × (X × Y)→R.
ϕ : X × Y → H : (implicit) feature map induced by k.
• ∆ : Y × Y → R: loss function
• Solve the convex optimization problem
n
1 2
minw,ξ w +C ξi
2 i=1

subject to margin constraints for i = 1, . . . , n :

∀y ∈ Y {yi } : ∆(y, yi ) + w, ϕ(xi , y) − w, ϕ(xi , yi ) ≤ ξi ,
• unique solution: w ∗ ∈ H

• I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun: Large Margin Methods for Structured and Interdependent
Output Variables, Journal of Machine Learning Research (JMLR), 2005.


• w ∗ deﬁnes compatiblity function

F (x, y) = w ∗ , ϕ(x, y)

• best prediction for x is the most compatible y:

g(x) := argmax F (x, y).
y∈Y

• evaluating g : X → Y is like generalized Sliding Window:
for ﬁxed x, evaluate quality function for every box y ∈ Y.
for example, use previous branch-and-bound procedure!

Joint Image/Box-Kernel: Example

Joint kernel: how to compare one (image,box)-pair (x, y) with
another (image,box)-pair (x , y )?

kjoint , =k , is large.

kjoint , =k , is small.

kjoint , = kimage ,

could also be large.

Loss Function: Example

Loss function: how to compare two boxes y and y ?

∆(y, y ) := 1 − area overlap between y and y
area(y ∩ y )
=1−
area(y ∪ y )


n
1 2
• S-SVM Optimization: minw,ξ 2
w +C ξi
i=1
subject to for i = 1, . . . , n :



n
1 2
• S-SVM Optimization: minw,ξ 2
w +C ξi
i=1
subject to for i = 1, . . . , n :


• Solve via constraint generation:
• Iterate:
Solve minimization with working set of contraints
Identify argmaxy∈Y ∆(y, yi ) + w, ϕ(xi , y)
Add violated constraints to working set and iterate
• Polynomial time convergence to any precision ε

• Similar to bootstrap training, but with a margin.

Evaluation: PASCAL VOC 2006

Example detections for VOC 2006 bicycle, bus and cat.

Precision–recall curves for VOC 2006 bicycle, bus and cat.

• Structured regression improves detection accuracy.
• New best scores (at that time) in 6 of 10 classes.

Why does it work?

Learned weights from binary (center) and structured training (right).

• Both methods assign positive weights to object region.
• Structured training also assigns negative weights to
features surrounding the bounding box position.
• Posterior distribution over box coordinates becomes more
peaked.

More Recent Results (PASCAL VOC 2009)

aeroplane


bicycle


bird


boat


bottle


bus


car


cat


chair


cow


diningtable


dog


horse


motorbike


person


pottedplant


sheep


sofa


train


tvmonitor

Extensions:

Image segmentation with connectedness constraint:

CRF segmentation connected CRF segmentation

• S. Nowozin, C.L.: Global Connectivity Potentials for Random Field Models, CVPR 2009.

Summary

Object Localization is a step towards image interpretation.

Conceptual approach instead of algorithmic:
• Branch-and-bound evaluation:
don’t slide a window, but solve an argmax problem,
⇒ higher eﬃciency

• Structured regression training:
solve the prediction problem, not a classiﬁcation proxy.
⇒ higher localization accuracy

• Modular and kernelized:
easily adapted to other problems/representations, e.g.
image segmentations

Structured regression for efficient object detection

Recommended

More Related Content

What's hot (19)

Similar to Structured regression for efficient object detection (20)

More from zukun (20)

Recently uploaded (20)

Structured regression for efficient object detection