"Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 1
Semantic Segmentation for Scene Understanding:
Algorithms and Implementations
Nagesh Gupta
May 3, 2016

• Auviz Systems
• Introduction to Semantic Segmentation
• Quick survey of techniques
• Fully Convolutional Network
• Implementation architectures & results
• FPGA & GPU implementations
• References
Topics

• ISV, specializes in implementing & optimizing algorithms on FPGAs
• Offers libraries of different classes of algorithms
• AuvizCV — optimized OpenCV algorithms
• AuvizLA — optimized BLAS
• AuvizDNN — optimized deep neural networks
• Develop Applications in Computer Vision, Linear Algebra, Deep
Learning & Machine Learning
• Available as OpenCL function calls for software users to abstract the
complexity of using an FPGA
• Visit our booth & see Semantic Segmentation running on Xilinx FPGA!
Auviz Systems

Introduction — Image Classification
Computer
Vision
Giraffe

Introduction — Semantic Segmentation
Computer
Vision

Object Detection vs. Semantic Segmentation

Applications of Semantic Segmentation
Automotive: Free space detection
Monocular depth estimation
Boundary prediction

A Survey of Different Methods for Semantic
Segmentation
Reference Paper
SIFT-Flow pixel
accuracy
C. Liu, J. Yuen, and A. Torralba, “Sift flow: Dense correspondence across scenes and its
applications
76.7
D. Eigen and R. Fergus. Nonparametric image parsing using adaptive neighbor sets 77.1
H. J. Myeong, Y. Chang, and K. M. Lee. Learning object relationships via graph-based context
model
77.1
P. H. Pinheiro and R. Collobert, “Recurrent convolutional neural networks for scene parsing” 77.7
C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierarchical features for scene
labeling
78.5
J. Tighe and S. Lazebnik, “Finding things: Image parsing with regions and per-exemplar
detectors”
78.6
J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic
segmentation”
85.2
Guosheng Lin, Chunhua Shen, Anton van den Hengel, Ian Reid, "Exploring Context with
Deep Structured models for Semantic Segmentation"
88.1

• An input image retains global features and loses the local details as it goes through
convolutions
• A CNN has several sub-sampling layers, which reduce the size of the input image
Classification Networks

• Replacing the fully connected layers in a CNN with convolutions retains a heat-
map
• Use the “heat-map” to segment the original image
• Figure adapted from: J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional
networks for semantic segmentation”
From Classification to Semantic Segmentation

• Multiple convolution layers followed by deconvolution layers and a
classifier
• Weights for all layers are learned through training using backpropagation
(gradient descent)
Fully Convolutional Networks (FCN)
Bird
Person
3D
convolution
3D
convolution
3D
convolution
Deconvolution
S
o
f
t
m
a
x
Sub-
sampling
Sub-
sampling
Sub-
sampling

• High resolution local information is lost due to down-sampling as we go from left
to right
• Skip layers overcome this by combining the global semantic information with
shallow features from layers prior to down-sampling
• Figure adapted from: J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional
networks for semantic segmentation”
Skip Layers — Improve Pixel Accuracy

Key parts of an FCN — Convolutions &
De-convolutions

• Results on a Tesla K40c GPU to implement an FCN using Caffe
• FCN created using VGG16 produces the best results for mean IoU, at the
cost of additional latency
Implementation results — GPU
FCN —
AlexNet
FCN —
VGG16
FCN —
GoogLeNet
Mean IoU 39.8 56.0 42.5
Forward
time
50 ms 210 ms 59 ms
Conv layers 8 16 22
Max stride 32 32 32
IoU, Intersection over Union:
Sseg: pixels from segmentation
Shum: pixels from ground truth

• GEMM
• Convolutions and de-convolutions can be mapped into a GEMM kernel [6]
• Requires significant data remapping – more resources and latency
• Re-mapping the data in the host CPU is another easy option using the
OpenCL development environment
• Convolutions
• Implement convolutions & de-convolutions using Convolution kernels
• Some data re-mapping is needed to use the convolution kernel for de-
convolutions
• Possible to achieve higher performance in the FPGA
Implementation Architectures — FPGA

• OpenCL is a simpler and faster way to implement FPGA accelerator
• Xilinx SDAccel tools provide the OpenCL infrastructure
• Altera (Intel) supports OpenCL
• The following infrastructure blocks are needed in addition to the accelerator
• PCIe & DMA
• External Memory Interface
• In a mid-range 28 nm FPGA such as Xilinx Virtex 7 690T, 25-30% is taken up by
infrastructure blocks
• 60-70% of the FPGA is available to implement the accelerator kernel
• Expect to get 1024 – 1536 MACs, running in the frequency range of 200-300 MHz
• A good design can thus achieve 400-600 GOPS
FPGA Accelerator — Resource & Performance
Estimates

Use Model — GPU
Fully
connected
Forward
convolution

Use Model — FPGA
Forward
conv
Fully
connected

• OpenCL is beginning to be the method of choice to implement CNNs [6] [7]
• AuvizDNN is a flexible framework built using OpenCL
FPGA Implementation Using OpenCL
HostCode
APIs calls are initiated by
Host
Calling APIs with different
parameters creates new
networks
Recompile on CPU to
create new networks
Use model similar to
CPU/GPU
KernelBinary
Highly optimized for
performance
Supports a wide range of
API parameters
FPGA recompilation/timing
closure not needed
No FPGA tools expertise
Available for different
accelerator boards
supported by FPGA
vendors

FPGA — Implementation Results
• Semantic segmentation
with 2-21 classes on a
500x500 image
• Network similar to AlexNet
• Results for XC7VX690
device is based on
achieved performance; rest
are projected
0
20
40
60
80
100
120
140
Images/Second

GPU FPGA
Mature use model and rich set of libraries
available
Libraries and use model are beginning
to catch up to GPU
Used extensively for training of CNNs Serious contender for deployment in the
data center & embedded applications
Traditionally higher in power Typically lower power draw
Well integrated into most CNN R&D
frameworks such as Caffe
Loosely integrated with Caffe
Entrenched in the research community —
used by most publications & researchers
FPGAs are extensively used in
embedded applications
Implementation Choice: FPGA/GPU

• [1] Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2014). Semantic image
segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062.
• [2] Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning hierarchical features for scene
labeling. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8), 1915-1929.
• [3] Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3431-3440).
• [4] Badrinarayanan, V., Handa, A., & Cipolla, R. (2015). Segnet: A deep convolutional encoder-decoder
architecture for robust semantic pixel-wise labeling. arXiv preprint arXiv:1505.07293.
• [5] C. Liu, J. Yuen, and A. Torralba, “Sift flow: Dense correspondence across scenes and its applications.”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 978-994, 2011
• [6] Naveen Suda et. al, “Throughput Optimized OpenCL-based FPGA Accelerator for Large-Scale CNNs”,
ISFPGA 2016
• [7] “Efficient Implementation of Neural Network Systems Built on FPGAs, Programmed with OpenCL”,
Altera White Paper
Reference

"Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

More Related Content

What's hot (20)

Viewers also liked (17)

Similar to "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems (20)

More from Edge AI and Vision Alliance (20)

Recently uploaded (20)

"Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems