This document summarizes the DenseBox paper, which introduces a unified end-to-end fully convolutional network (FCN) framework for object detection. The key points are:
1. DenseBox directly predicts bounding boxes and class confidences through all locations and scales of an image using a single FCN, showing one-stage detectors can detect objects under different scales.
2. DenseBox is designed to detect small and occluded objects by fusing features from different convolutional layers and generating dense predictions.
3. It performs multi-task training for classification, regression, and optionally landmark localization through multiple loss functions and hard negative mining.
4. Experiments on face and car detection datasets show DenseBox achie