SlideShare a Scribd company logo
Copyright © 2016 Auviz Systems 1
Semantic Segmentation for Scene Understanding:
Algorithms and Implementations
Nagesh Gupta
May 3, 2016
Copyright © 2016 Auviz Systems 2
• Auviz Systems
• Introduction to Semantic Segmentation
• Quick survey of techniques
• Fully Convolutional Network
• Implementation architectures & results
• FPGA & GPU implementations
• References
Topics
Copyright © 2016 Auviz Systems 3
• ISV, specializes in implementing & optimizing algorithms on FPGAs
• Offers libraries of different classes of algorithms
• AuvizCV — optimized OpenCV algorithms
• AuvizLA — optimized BLAS
• AuvizDNN — optimized deep neural networks
• Develop Applications in Computer Vision, Linear Algebra, Deep
Learning & Machine Learning
• Available as OpenCL function calls for software users to abstract the
complexity of using an FPGA
• Visit our booth & see Semantic Segmentation running on Xilinx FPGA!
Auviz Systems
Copyright © 2016 Auviz Systems 4
Introduction — Image Classification
Computer
Vision
Giraffe
Copyright © 2016 Auviz Systems 5
Introduction — Semantic Segmentation
Computer
Vision
Copyright © 2016 Auviz Systems 6
Object Detection vs. Semantic Segmentation
Copyright © 2016 Auviz Systems 7
Applications of Semantic Segmentation
Automotive: Free space detection
Monocular depth estimation
Boundary prediction
Copyright © 2016 Auviz Systems 8
A Survey of Different Methods for Semantic
Segmentation
Reference Paper
SIFT-Flow pixel
accuracy
C. Liu, J. Yuen, and A. Torralba, “Sift flow: Dense correspondence across scenes and its
applications
76.7
D. Eigen and R. Fergus. Nonparametric image parsing using adaptive neighbor sets 77.1
H. J. Myeong, Y. Chang, and K. M. Lee. Learning object relationships via graph-based context
model
77.1
P. H. Pinheiro and R. Collobert, “Recurrent convolutional neural networks for scene parsing” 77.7
C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierarchical features for scene
labeling
78.5
J. Tighe and S. Lazebnik, “Finding things: Image parsing with regions and per-exemplar
detectors”
78.6
J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic
segmentation”
85.2
Guosheng Lin, Chunhua Shen, Anton van den Hengel, Ian Reid, "Exploring Context with
Deep Structured models for Semantic Segmentation"
88.1
Copyright © 2016 Auviz Systems 9
• An input image retains global features and loses the local details as it goes through
convolutions
• A CNN has several sub-sampling layers, which reduce the size of the input image
Classification Networks
Copyright © 2016 Auviz Systems 10
• Replacing the fully connected layers in a CNN with convolutions retains a heat-
map
• Use the “heat-map” to segment the original image
• Figure adapted from: J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional
networks for semantic segmentation”
From Classification to Semantic Segmentation
Copyright © 2016 Auviz Systems 11
• Multiple convolution layers followed by deconvolution layers and a
classifier
• Weights for all layers are learned through training using backpropagation
(gradient descent)
Fully Convolutional Networks (FCN)
Bird
Person
3D
convolution
3D
convolution
3D
convolution
Deconvolution
S
o
f
t
m
a
x
Sub-
sampling
Sub-
sampling
Sub-
sampling
Copyright © 2016 Auviz Systems 12
• High resolution local information is lost due to down-sampling as we go from left
to right
• Skip layers overcome this by combining the global semantic information with
shallow features from layers prior to down-sampling
• Figure adapted from: J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional
networks for semantic segmentation”
Skip Layers — Improve Pixel Accuracy
Copyright © 2016 Auviz Systems 13
Key parts of an FCN — Convolutions &
De-convolutions
Copyright © 2016 Auviz Systems 14
• Results on a Tesla K40c GPU to implement an FCN using Caffe
• FCN created using VGG16 produces the best results for mean IoU, at the
cost of additional latency
Implementation results — GPU
FCN —
AlexNet
FCN —
VGG16
FCN —
GoogLeNet
Mean IoU 39.8 56.0 42.5
Forward
time
50 ms 210 ms 59 ms
Conv layers 8 16 22
Max stride 32 32 32
IoU, Intersection over Union:
Sseg: pixels from segmentation
Shum: pixels from ground truth
Copyright © 2016 Auviz Systems 15
• GEMM
• Convolutions and de-convolutions can be mapped into a GEMM kernel [6]
• Requires significant data remapping – more resources and latency
• Re-mapping the data in the host CPU is another easy option using the
OpenCL development environment
• Convolutions
• Implement convolutions & de-convolutions using Convolution kernels
• Some data re-mapping is needed to use the convolution kernel for de-
convolutions
• Possible to achieve higher performance in the FPGA
Implementation Architectures — FPGA
Copyright © 2016 Auviz Systems 16
• OpenCL is a simpler and faster way to implement FPGA accelerator
• Xilinx SDAccel tools provide the OpenCL infrastructure
• Altera (Intel) supports OpenCL
• The following infrastructure blocks are needed in addition to the accelerator
• PCIe & DMA
• External Memory Interface
• In a mid-range 28 nm FPGA such as Xilinx Virtex 7 690T, 25-30% is taken up by
infrastructure blocks
• 60-70% of the FPGA is available to implement the accelerator kernel
• Expect to get 1024 – 1536 MACs, running in the frequency range of 200-300 MHz
• A good design can thus achieve 400-600 GOPS
FPGA Accelerator — Resource & Performance
Estimates
Copyright © 2016 Auviz Systems 17
Use Model — GPU
Fully
connected
Forward
convolution
Copyright © 2016 Auviz Systems 18
Use Model — FPGA
Forward
conv
Fully
connected
Copyright © 2016 Auviz Systems 19
• OpenCL is beginning to be the method of choice to implement CNNs [6] [7]
• AuvizDNN is a flexible framework built using OpenCL
FPGA Implementation Using OpenCL
HostCode
APIs calls are initiated by
Host
Calling APIs with different
parameters creates new
networks
Recompile on CPU to
create new networks
Use model similar to
CPU/GPU
KernelBinary
Highly optimized for
performance
Supports a wide range of
API parameters
FPGA recompilation/timing
closure not needed
No FPGA tools expertise
Available for different
accelerator boards
supported by FPGA
vendors
Copyright © 2016 Auviz Systems 20
FPGA — Implementation Results
• Semantic segmentation
with 2-21 classes on a
500x500 image
• Network similar to AlexNet
• Results for XC7VX690
device is based on
achieved performance; rest
are projected
0
20
40
60
80
100
120
140
Images/Second
Copyright © 2016 Auviz Systems 21
GPU FPGA
Mature use model and rich set of libraries
available
Libraries and use model are beginning
to catch up to GPU
Used extensively for training of CNNs Serious contender for deployment in the
data center & embedded applications
Traditionally higher in power Typically lower power draw
Well integrated into most CNN R&D
frameworks such as Caffe
Loosely integrated with Caffe
Entrenched in the research community —
used by most publications & researchers
FPGAs are extensively used in
embedded applications
Implementation Choice: FPGA/GPU
Copyright © 2016 Auviz Systems 23
• [1] Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2014). Semantic image
segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062.
• [2] Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning hierarchical features for scene
labeling. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8), 1915-1929.
• [3] Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3431-3440).
• [4] Badrinarayanan, V., Handa, A., & Cipolla, R. (2015). Segnet: A deep convolutional encoder-decoder
architecture for robust semantic pixel-wise labeling. arXiv preprint arXiv:1505.07293.
• [5] C. Liu, J. Yuen, and A. Torralba, “Sift flow: Dense correspondence across scenes and its applications.”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 978-994, 2011
• [6] Naveen Suda et. al, “Throughput Optimized OpenCL-based FPGA Accelerator for Large-Scale CNNs”,
ISFPGA 2016
• [7] “Efficient Implementation of Neural Network Systems Built on FPGAs, Programmed with OpenCL”,
Altera White Paper
Reference

More Related Content

What's hot (20)

PPTX
Background subtraction
Raviraj singh shekhawat
 
PDF
Deep learning based object detection basics
Brodmann17
 
PDF
Optic flow estimation with deep learning
Yu Huang
 
PDF
Faster R-CNN: Towards real-time object detection with region proposal network...
Universitat Politècnica de Catalunya
 
PPTX
Image classification using CNN
Noura Hussein
 
PPTX
5. gray level transformation
MdFazleRabbi18
 
PDF
Introduction to object detection
Brodmann17
 
PDF
Mask R-CNN
Chanuk Lim
 
PPTX
Object detection
Jksuryawanshi
 
PPTX
Introduction to CNN
Shuai Zhang
 
PDF
Generative adversarial networks
남주 김
 
PPTX
Image feature extraction
Rushin Shah
 
PDF
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Universitat Politècnica de Catalunya
 
PPTX
Histogram Specification or Matching Problem
Kalyan Acharjya
 
PPTX
Image Compression
Paramjeet Singh Jamwal
 
PDF
Deep Learning - Convolutional Neural Networks
Christian Perone
 
PPTX
Yolo
NEHA Kapoor
 
PPTX
Deep Learning in Computer Vision
Sungjoon Choi
 
PDF
Moving Object Detection And Tracking Using CNN
NITISHKUMAR1401
 
Background subtraction
Raviraj singh shekhawat
 
Deep learning based object detection basics
Brodmann17
 
Optic flow estimation with deep learning
Yu Huang
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Universitat Politècnica de Catalunya
 
Image classification using CNN
Noura Hussein
 
5. gray level transformation
MdFazleRabbi18
 
Introduction to object detection
Brodmann17
 
Mask R-CNN
Chanuk Lim
 
Object detection
Jksuryawanshi
 
Introduction to CNN
Shuai Zhang
 
Generative adversarial networks
남주 김
 
Image feature extraction
Rushin Shah
 
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Universitat Politècnica de Catalunya
 
Histogram Specification or Matching Problem
Kalyan Acharjya
 
Image Compression
Paramjeet Singh Jamwal
 
Deep Learning - Convolutional Neural Networks
Christian Perone
 
Deep Learning in Computer Vision
Sungjoon Choi
 
Moving Object Detection And Tracking Using CNN
NITISHKUMAR1401
 

Viewers also liked (17)

PDF
crfasrnn_presentation
Sadeep Jayasumana
 
PDF
#6 PyData Warsaw: Deep learning for image segmentation
Matthew Opala
 
PDF
(Semantic Web Technologies and Applications track) "A Quantitative Comparison...
icwe2015
 
PPTX
Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Yi-Hsuan Tsai
 
PDF
Improving Spatial Codification in Semantic Segmentation
Universitat Politècnica de Catalunya
 
PPTX
Semantic Mapping of Road Scenes
Sunando Sengupta
 
PPTX
TensorFlow Tutorial Part2
Sungjoon Choi
 
PDF
Multimedia Information Retrieval
Stephane Marchand-Maillet
 
PPTX
Multimedia content based retrieval slideshare.ppt
govintech1
 
PDF
Dataset for Semantic Urban Scene Understanding
Yosuke Shinya
 
PPTX
Deep learning intro
beamandrew
 
PDF
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
NamHyuk Ahn
 
PDF
Semantic segmentation
Takuya Minagawa
 
PPTX
Deep Learning in Robotics
Sungjoon Choi
 
PDF
Deep Learning for Computer Vision: Segmentation (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Efficient exploration of region hierarchies for semantic segmentation
Universitat Politècnica de Catalunya
 
PDF
State of the Word 2011
photomatt
 
crfasrnn_presentation
Sadeep Jayasumana
 
#6 PyData Warsaw: Deep learning for image segmentation
Matthew Opala
 
(Semantic Web Technologies and Applications track) "A Quantitative Comparison...
icwe2015
 
Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Yi-Hsuan Tsai
 
Improving Spatial Codification in Semantic Segmentation
Universitat Politècnica de Catalunya
 
Semantic Mapping of Road Scenes
Sunando Sengupta
 
TensorFlow Tutorial Part2
Sungjoon Choi
 
Multimedia Information Retrieval
Stephane Marchand-Maillet
 
Multimedia content based retrieval slideshare.ppt
govintech1
 
Dataset for Semantic Urban Scene Understanding
Yosuke Shinya
 
Deep learning intro
beamandrew
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
NamHyuk Ahn
 
Semantic segmentation
Takuya Minagawa
 
Deep Learning in Robotics
Sungjoon Choi
 
Deep Learning for Computer Vision: Segmentation (UPC 2016)
Universitat Politècnica de Catalunya
 
Efficient exploration of region hierarchies for semantic segmentation
Universitat Politècnica de Catalunya
 
State of the Word 2011
photomatt
 

Similar to "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems (20)

PPTX
Dp2 ppt by_bikramjit_chowdhury_final
Bikramjit Chowdhury
 
PPTX
Rack Cluster Deployment for SDSC Supercomputer
Rebekah Rodriguez
 
PDF
“A Cutting-edge Memory Optimization Method for Embedded AI Accelerators,” a P...
Edge AI and Vision Alliance
 
PDF
Assisting User’s Transition to Titan’s Accelerated Architecture
inside-BigData.com
 
PPT
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
HPCC Systems
 
PDF
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
Edge AI and Vision Alliance
 
PPTX
First phase slide presentation on "ANALYZING THE EFFECTIVENESS OF THE ADVANCE...
Nikhil Jain
 
PPTX
HPC and cloud distributed computing, as a journey
Peter Clapham
 
PDF
01-06 OCRE Test Suite - Fernandes.pdf
OCRE | Open Clouds for Research Environments
 
PPT
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
Masaharu Munetomo
 
PDF
FPGA Hardware Accelerator for Machine Learning
Dr. Swaminathan Kathirvel
 
PDF
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
Edge AI and Vision Alliance
 
PDF
OCRE webinar - April 14 - Cloud_Validation_Suite_Ignacio Peluaga Lozada.pdf
OCRE | Open Clouds for Research Environments
 
PDF
Deep Learning at Scale
Mateusz Dymczyk
 
PPTX
Chug dl presentation
Chicago Hadoop Users Group
 
PDF
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
Bomm Kim
 
PDF
Rethinking computation: A processor architecture for machine intelligence
Intel Nervana
 
PDF
Exploring emerging technologies in the HPC co-design space
jsvetter
 
PPTX
2018 03 25 system ml ai and openpower meetup
Ganesan Narayanasamy
 
PPTX
NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...
NECST Lab @ Politecnico di Milano
 
Dp2 ppt by_bikramjit_chowdhury_final
Bikramjit Chowdhury
 
Rack Cluster Deployment for SDSC Supercomputer
Rebekah Rodriguez
 
“A Cutting-edge Memory Optimization Method for Embedded AI Accelerators,” a P...
Edge AI and Vision Alliance
 
Assisting User’s Transition to Titan’s Accelerated Architecture
inside-BigData.com
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
HPCC Systems
 
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
Edge AI and Vision Alliance
 
First phase slide presentation on "ANALYZING THE EFFECTIVENESS OF THE ADVANCE...
Nikhil Jain
 
HPC and cloud distributed computing, as a journey
Peter Clapham
 
01-06 OCRE Test Suite - Fernandes.pdf
OCRE | Open Clouds for Research Environments
 
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
Masaharu Munetomo
 
FPGA Hardware Accelerator for Machine Learning
Dr. Swaminathan Kathirvel
 
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
Edge AI and Vision Alliance
 
OCRE webinar - April 14 - Cloud_Validation_Suite_Ignacio Peluaga Lozada.pdf
OCRE | Open Clouds for Research Environments
 
Deep Learning at Scale
Mateusz Dymczyk
 
Chug dl presentation
Chicago Hadoop Users Group
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
Bomm Kim
 
Rethinking computation: A processor architecture for machine intelligence
Intel Nervana
 
Exploring emerging technologies in the HPC co-design space
jsvetter
 
2018 03 25 system ml ai and openpower meetup
Ganesan Narayanasamy
 
NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...
NECST Lab @ Politecnico di Milano
 

More from Edge AI and Vision Alliance (20)

PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
PDF
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
Edge AI and Vision Alliance
 
PDF
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
Edge AI and Vision Alliance
 
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
Edge AI and Vision Alliance
 
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
Edge AI and Vision Alliance
 
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
Edge AI and Vision Alliance
 
PDF
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
Edge AI and Vision Alliance
 
PDF
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
Edge AI and Vision Alliance
 
PDF
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
PDF
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
Edge AI and Vision Alliance
 
PDF
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
PDF
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
PDF
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
PDF
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
Edge AI and Vision Alliance
 
PDF
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
PDF
“How Qualcomm Is Powering AI-driven Multimedia at the Edge,” a Presentation f...
Edge AI and Vision Alliance
 
PDF
“OAAX: One Standard for AI Vision on Any Compute Platform,” a Presentation fr...
Edge AI and Vision Alliance
 
PDF
“Improved Data Sampling Techniques for Training Neural Networks,” a Presentat...
Edge AI and Vision Alliance
 
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
Edge AI and Vision Alliance
 
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
Edge AI and Vision Alliance
 
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
Edge AI and Vision Alliance
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
Edge AI and Vision Alliance
 
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
Edge AI and Vision Alliance
 
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
Edge AI and Vision Alliance
 
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
Edge AI and Vision Alliance
 
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
Edge AI and Vision Alliance
 
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
Edge AI and Vision Alliance
 
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
“How Qualcomm Is Powering AI-driven Multimedia at the Edge,” a Presentation f...
Edge AI and Vision Alliance
 
“OAAX: One Standard for AI Vision on Any Compute Platform,” a Presentation fr...
Edge AI and Vision Alliance
 
“Improved Data Sampling Techniques for Training Neural Networks,” a Presentat...
Edge AI and Vision Alliance
 

Recently uploaded (20)

PDF
Open Source Milvus Vector Database v 2.6
Zilliz
 
PDF
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
PDF
How to Visualize the ​Spatio-Temporal Data Using CesiumJS​
SANGHEE SHIN
 
PDF
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
PDF
Plugging AI into everything: Model Context Protocol Simplified.pdf
Abati Adewale
 
PDF
Redefining Work in the Age of AI - What to expect? How to prepare? Why it mat...
Malinda Kapuruge
 
PDF
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
PDF
Kubernetes - Architecture & Components.pdf
geethak285
 
PPTX
Paycifi - Programmable Trust_Breakfast_PPTXT
FinTech Belgium
 
PDF
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
PDF
Why aren't you using FME Flow's CPU Time?
Safe Software
 
DOCX
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
PPTX
Practical Applications of AI in Local Government
OnBoard
 
PDF
The Growing Value and Application of FME & GenAI
Safe Software
 
PDF
Unlocking FME Flow’s Potential: Architecture Design for Modern Enterprises
Safe Software
 
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
 
PPTX
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
 
PPSX
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
 
PPTX
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
PDF
ArcGIS Utility Network Migration - The Hunter Water Story
Safe Software
 
Open Source Milvus Vector Database v 2.6
Zilliz
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
How to Visualize the ​Spatio-Temporal Data Using CesiumJS​
SANGHEE SHIN
 
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
Plugging AI into everything: Model Context Protocol Simplified.pdf
Abati Adewale
 
Redefining Work in the Age of AI - What to expect? How to prepare? Why it mat...
Malinda Kapuruge
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
Kubernetes - Architecture & Components.pdf
geethak285
 
Paycifi - Programmable Trust_Breakfast_PPTXT
FinTech Belgium
 
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
Why aren't you using FME Flow's CPU Time?
Safe Software
 
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
Practical Applications of AI in Local Government
OnBoard
 
The Growing Value and Application of FME & GenAI
Safe Software
 
Unlocking FME Flow’s Potential: Architecture Design for Modern Enterprises
Safe Software
 
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
 
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
 
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
 
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
ArcGIS Utility Network Migration - The Hunter Water Story
Safe Software
 

"Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

  • 1. Copyright © 2016 Auviz Systems 1 Semantic Segmentation for Scene Understanding: Algorithms and Implementations Nagesh Gupta May 3, 2016
  • 2. Copyright © 2016 Auviz Systems 2 • Auviz Systems • Introduction to Semantic Segmentation • Quick survey of techniques • Fully Convolutional Network • Implementation architectures & results • FPGA & GPU implementations • References Topics
  • 3. Copyright © 2016 Auviz Systems 3 • ISV, specializes in implementing & optimizing algorithms on FPGAs • Offers libraries of different classes of algorithms • AuvizCV — optimized OpenCV algorithms • AuvizLA — optimized BLAS • AuvizDNN — optimized deep neural networks • Develop Applications in Computer Vision, Linear Algebra, Deep Learning & Machine Learning • Available as OpenCL function calls for software users to abstract the complexity of using an FPGA • Visit our booth & see Semantic Segmentation running on Xilinx FPGA! Auviz Systems
  • 4. Copyright © 2016 Auviz Systems 4 Introduction — Image Classification Computer Vision Giraffe
  • 5. Copyright © 2016 Auviz Systems 5 Introduction — Semantic Segmentation Computer Vision
  • 6. Copyright © 2016 Auviz Systems 6 Object Detection vs. Semantic Segmentation
  • 7. Copyright © 2016 Auviz Systems 7 Applications of Semantic Segmentation Automotive: Free space detection Monocular depth estimation Boundary prediction
  • 8. Copyright © 2016 Auviz Systems 8 A Survey of Different Methods for Semantic Segmentation Reference Paper SIFT-Flow pixel accuracy C. Liu, J. Yuen, and A. Torralba, “Sift flow: Dense correspondence across scenes and its applications 76.7 D. Eigen and R. Fergus. Nonparametric image parsing using adaptive neighbor sets 77.1 H. J. Myeong, Y. Chang, and K. M. Lee. Learning object relationships via graph-based context model 77.1 P. H. Pinheiro and R. Collobert, “Recurrent convolutional neural networks for scene parsing” 77.7 C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierarchical features for scene labeling 78.5 J. Tighe and S. Lazebnik, “Finding things: Image parsing with regions and per-exemplar detectors” 78.6 J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation” 85.2 Guosheng Lin, Chunhua Shen, Anton van den Hengel, Ian Reid, "Exploring Context with Deep Structured models for Semantic Segmentation" 88.1
  • 9. Copyright © 2016 Auviz Systems 9 • An input image retains global features and loses the local details as it goes through convolutions • A CNN has several sub-sampling layers, which reduce the size of the input image Classification Networks
  • 10. Copyright © 2016 Auviz Systems 10 • Replacing the fully connected layers in a CNN with convolutions retains a heat- map • Use the “heat-map” to segment the original image • Figure adapted from: J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation” From Classification to Semantic Segmentation
  • 11. Copyright © 2016 Auviz Systems 11 • Multiple convolution layers followed by deconvolution layers and a classifier • Weights for all layers are learned through training using backpropagation (gradient descent) Fully Convolutional Networks (FCN) Bird Person 3D convolution 3D convolution 3D convolution Deconvolution S o f t m a x Sub- sampling Sub- sampling Sub- sampling
  • 12. Copyright © 2016 Auviz Systems 12 • High resolution local information is lost due to down-sampling as we go from left to right • Skip layers overcome this by combining the global semantic information with shallow features from layers prior to down-sampling • Figure adapted from: J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation” Skip Layers — Improve Pixel Accuracy
  • 13. Copyright © 2016 Auviz Systems 13 Key parts of an FCN — Convolutions & De-convolutions
  • 14. Copyright © 2016 Auviz Systems 14 • Results on a Tesla K40c GPU to implement an FCN using Caffe • FCN created using VGG16 produces the best results for mean IoU, at the cost of additional latency Implementation results — GPU FCN — AlexNet FCN — VGG16 FCN — GoogLeNet Mean IoU 39.8 56.0 42.5 Forward time 50 ms 210 ms 59 ms Conv layers 8 16 22 Max stride 32 32 32 IoU, Intersection over Union: Sseg: pixels from segmentation Shum: pixels from ground truth
  • 15. Copyright © 2016 Auviz Systems 15 • GEMM • Convolutions and de-convolutions can be mapped into a GEMM kernel [6] • Requires significant data remapping – more resources and latency • Re-mapping the data in the host CPU is another easy option using the OpenCL development environment • Convolutions • Implement convolutions & de-convolutions using Convolution kernels • Some data re-mapping is needed to use the convolution kernel for de- convolutions • Possible to achieve higher performance in the FPGA Implementation Architectures — FPGA
  • 16. Copyright © 2016 Auviz Systems 16 • OpenCL is a simpler and faster way to implement FPGA accelerator • Xilinx SDAccel tools provide the OpenCL infrastructure • Altera (Intel) supports OpenCL • The following infrastructure blocks are needed in addition to the accelerator • PCIe & DMA • External Memory Interface • In a mid-range 28 nm FPGA such as Xilinx Virtex 7 690T, 25-30% is taken up by infrastructure blocks • 60-70% of the FPGA is available to implement the accelerator kernel • Expect to get 1024 – 1536 MACs, running in the frequency range of 200-300 MHz • A good design can thus achieve 400-600 GOPS FPGA Accelerator — Resource & Performance Estimates
  • 17. Copyright © 2016 Auviz Systems 17 Use Model — GPU Fully connected Forward convolution
  • 18. Copyright © 2016 Auviz Systems 18 Use Model — FPGA Forward conv Fully connected
  • 19. Copyright © 2016 Auviz Systems 19 • OpenCL is beginning to be the method of choice to implement CNNs [6] [7] • AuvizDNN is a flexible framework built using OpenCL FPGA Implementation Using OpenCL HostCode APIs calls are initiated by Host Calling APIs with different parameters creates new networks Recompile on CPU to create new networks Use model similar to CPU/GPU KernelBinary Highly optimized for performance Supports a wide range of API parameters FPGA recompilation/timing closure not needed No FPGA tools expertise Available for different accelerator boards supported by FPGA vendors
  • 20. Copyright © 2016 Auviz Systems 20 FPGA — Implementation Results • Semantic segmentation with 2-21 classes on a 500x500 image • Network similar to AlexNet • Results for XC7VX690 device is based on achieved performance; rest are projected 0 20 40 60 80 100 120 140 Images/Second
  • 21. Copyright © 2016 Auviz Systems 21 GPU FPGA Mature use model and rich set of libraries available Libraries and use model are beginning to catch up to GPU Used extensively for training of CNNs Serious contender for deployment in the data center & embedded applications Traditionally higher in power Typically lower power draw Well integrated into most CNN R&D frameworks such as Caffe Loosely integrated with Caffe Entrenched in the research community — used by most publications & researchers FPGAs are extensively used in embedded applications Implementation Choice: FPGA/GPU
  • 22. Copyright © 2016 Auviz Systems 23 • [1] Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2014). Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062. • [2] Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning hierarchical features for scene labeling. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8), 1915-1929. • [3] Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3431-3440). • [4] Badrinarayanan, V., Handa, A., & Cipolla, R. (2015). Segnet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labeling. arXiv preprint arXiv:1505.07293. • [5] C. Liu, J. Yuen, and A. Torralba, “Sift flow: Dense correspondence across scenes and its applications.” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 978-994, 2011 • [6] Naveen Suda et. al, “Throughput Optimized OpenCL-based FPGA Accelerator for Large-Scale CNNs”, ISFPGA 2016 • [7] “Efficient Implementation of Neural Network Systems Built on FPGAs, Programmed with OpenCL”, Altera White Paper Reference