Yolo

You Only Look Once:
Unified, Real-Time Object Detection
Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi (2016)

The YOLO Detection System
(1) resizes the input image to 448 × 448.
(2) runs a single convolutional network on the image.
(3) thresholds the resulting detections by the model’s confidence.

https://www.jeremyjordan.me/object-detection-one-stage/
Non-maximum suppression

Bounding Box, Confidence and Class Probability
YOLO reframes
object detection
as a regression
problem.
• The image is divided into an S × S grid and for each grid cell predicts B bounding
boxes (x, y, w, h), confidence for those boxes, and C class probabilities.
• These predictions are encoded as an S × S × (B ∗ 5 + C) tensor.

Bounding Box, Confidence and Class Probability
The confidence of the bounding box
Formally we define
confidence as Pr(Object) ∗
IOU . If no object exists in that
cell, the confidence scores
should be zero.

The Neural Network Architecture
For evaluating YOLO on PASCAL VOC, we use S = 7, B = 2. PASCAL VOC has 20 labelled
classes so C = 20. Our final prediction is a 7 × 7 × (2∗5 + 20) tensor.

Loss Function
The size of the bounding box
The confidence of the bounding box
The probability of the class

Intersection Over Union (IOU) and Object Detection
https://devblogs.nvidia.com/exploring-spacenet-dataset-using-digits/

Recall-Precision Curve and Average Precision
https://acutecaretesting.org/en/articles/precision-
recall-curves-what-are-they-and-how-are-they-used
Ideally, the value of the Precision does not
decrease as the increase of the value of Recall.
The general definition for the Average Precision
(AP) is finding the area under the precision-recall
curve.

https://medium.com/@jonathan_hui/ma
p-mean-average-precision-for-object-
detection-45c121a31173
The dataset contains 5 apples only. We
collect all the predictions made for apples
in all the images and rank it in descending
order according to the predicted
confidence level.
The second column indicates whether the
prediction is correct or not. In this example,
the prediction is correct if IoU ≥ 0.5.
Recall-Precision Curve and Average Precision

An average for the 11-point interpolated AP is calculated and the curve is divided from
0 to 1.0 into 11 points
Average Precision (AP) is the
area under the precision-recall
curve.
mAP (mean average precision) is the average of the AP for each class.
Average Precision and mean Average Precision

Fast YOLO uses a neural network
with fewer convolutional layers (9
instead of 24) and fewer filters in
those layers.
Comparison to Other Real-Time Systems
YOLO is 10 mAP more accurate than the fast version while still well above
real-time in speed.

VOC 2007 Error Analysis
•Correct: correct class and IOU > .5
• Localization: correct class, .1 < IOU < .5
• Similar: class is similar, IOU > .1
• Other: class is wrong, IOU > .1
• Background: IOU < .1 for any object
Localization errors account for more of YOLO’s errors than all other sources
combined. Fast R-CNN makes much fewer localization errors but far more
background errors.

Yolo

More Related Content

What's hot (20)

Similar to Yolo (20)

Recently uploaded (20)

Yolo