SlideShare a Scribd company logo
© 2020 MathWorks
Deploying Deep Learning Application
on FPGAs with MATLAB
Jack Erickson
Technical Marketing
September 2020
© 2020 MathWorks
Airborne Image
Analysis
Deep Learning Deployment on Embedded Devices
2
Autonomous Driving Industrial Inspection
Medical Image
Analysis
Wireless Modulation
Classification
Radar Signature
Classification
© 2020 MathWorks
System Requirements Drive Network Design
3
Industrial Inspection
Hardware/Software
Engineers
Systems
Engineer
Deep Learning
Practitioner
Camera specs
Accuracy
Latency
Cost
Power
© 2020 MathWorks
Challenges of Deploying Deep Learning to FPGA Hardware:
Convolution
Each stride is an 11x11x3 matrix multiply-accumulate
→105M floating-point multiply operations!
55
55
96
filters
stride=4
224
224
11x11
96 filters of 11x11x3 of 32-bit parameters →140k bytes
11x11
→1.16M bytes of activations
© 2020 MathWorks
Challenges of Deploying Deep Learning to FPGA Hardware
conv
1
conv
2
conv
3
conv
4
conv
5
fc6 fc7 fc8input Total
140K 1.2M 3.5M 5.2M 1.8M 148M 64M 16M
Parameters
(Bytes)
n/a 230 M
1.1M 728K 252K 252K 168K 16K 16K 4K
Activations
(Bytes)
588K 3.1 M
105M 223M 149M 112M 74M 37M 16M 4MFLOPs n/a 720 M
Off-chip RAM
Block RAM
DSP Slices
© 2020 MathWorks
Deploying Deep Learning to FPGA Hardware Requires
Collaboration
140K 1.2M 3.5M 5.2M 1.8M 148M 64M 16M
Parameters
(Bytes)
conv
1
conv
2
conv
3
conv
4
conv
5
fc6 fc7 fc8
n/a
input Total
230 M
1.1M 728K 252K 252K 168K 16K 16K 4K
Activations
(Bytes)
588K 3.1 M
105M 223M 149M 112M 74M 37M 16M 4MFLOPs n/a 720 M
Resize
Acquire
data
Output /
display
Mem i/f
Optimize
• Network /layers
• Fixed-point quantization
• Processor micro-architecture
© 2020 MathWorks
A Collaborative AI Workflow
Model design and
tuning
Hardware
accelerated training
Interoperability
AI Modeling
Integration with
complex systems
System verification
and validation
System simulation
System Design
Data cleansing and
preparation
Simulation-
generated data
Human insight
Data Preparation
7
Enterprise systems
Embedded devices
Edge, cloud,
desktop
Deployment
I Iteration and Refinement
© 2020 MathWorks
Design and Analyze Your Networks in MATLAB
Classification Learner app to try different classifiers
and find the best fit for your data set
Deep Network Designer app to build, visualize, and
edit deep learning networks
Model design and
tuning
Hardware
accelerated training
Interoperability
AI Modeling
8
© 2020 MathWorks
9
MATLAB Interoperates with Other AI Frameworks
Caffe importer
Keras importer
Model design and
tuning
Hardware
accelerated training
Interoperability
AI Modeling
© 2020 MathWorks
Application
logic
Deploy from MATLAB to a Variety of Hardware Platforms
10
FPGA
CPU
GPU Enterprise systems
Embedded devices
Edge, cloud,
desktop
Deployment
© 2020 MathWorks
FPGA Deployment from MATLAB
© 2020 MathWorks
Application
logic
Get Started Prototyping on FPGA with Deep Learning HDL
ToolboxTM
Hardware support package
Deep learning processor with I/O and
external memory interfaces
• Int8 or single precision
• Supported boards:
• Xilinx: ZCU102 or ZC706
• Intel: Arria10 SoC
• http://mathworks.com/hardware-support.html
FPGA Bitstream
Layer
control
instructions
Weights &
Activations
Fully
Connected
Module
Convolution
Module
Processor Control
Memory Access
Activations
Activations
Activations
© 2020 MathWorks
Defect Detection Example
Application
logic
Pre-processing: Extract
regions and resize
Post-processing:
Annotate and label
Inference: Predict
using trained network
FPGA
© 2020 MathWorks
Run Deep Learning on FPGA from MATLAB
14
© 2020 MathWorks
Application
logic
Prototyping: Design Exploration and Customization
15
Layer
control
instructions
Weights &
Activations
Re-train
© 2020 MathWorks
Design Exploration and Customization
16
© 2020 MathWorks
Optimizing Deep Learning Applications Requires
Collaboration
Systems
Engineer
Deep Learning
Practitioner
Hardware/Software
Engineers
Fully
Connected
Module
Convolution
Module
Processor Control
Memory Access
Activations
Activations
Activations
x
+/-
Σ
32
/
x
+/-
Σ
8
/
x
+/-
Σ
8
/
x
+/-
Σ
8
/
x
+/-
Σ
8
/
© 2020 MathWorks
INT8 Quantization
18
© 2020 MathWorks
Application
logic
Deep Learning HDL Toolbox
19
Layer
control
instructions
Weights &
Activations
Re-train
Modify network
% Create target object
hTarget = dlhdl.Target(…)
% Create workflow object, using the target
hW = dlhdl.Workflow(…);
% Compile the network
hW.compile;
% Program the bitstream and deploy the compiled network and weights
hW.deploy;
% Run prediction
[score, speed] = hW.predict(img, ‘Profile’, ‘on’);>> deepNetworkDesigner
Parameters Speed
140 MB 18 fps
84 MB 45 fps
68 MB 139 fps
Quantize
>> deepNetworkQuantizer
Generate
HDL
Iterate and Converge on Deep Learning FPGA Deployment from MATLAB
© 2020 MathWorks
Application
logic
Deploy from MATLAB to a Variety of Hardware Platforms
20
FPGA
CPU
GPU
Tune for system requirements
Prototype from MATLAB
Configure and generate RTL
Deep Learning HDL Toolbox
© 2020 MathWorks
Resource Slide
Deep Learning Solutions in MATLAB
https://www.mathworks.com/solutions/deep-learning.html
Deep Learning HDL Toolbox
https://www.mathworks.com/products/deep-learning-hdl.html
Onramp: Deep Learning in MATLAB
https://www.mathworks.com/learn/tutorials/deep-learning-onramp.html
MathWorks FPGA Solutions Page
https://www.mathworks.com/solutions/fpga-asic-soc-development.html
21

More Related Content

PPTX
Session 2
PDF
A Software Defined WAN Architecture
PPTX
Apache web server
PDF
Namespaces and cgroups - the basis of Linux containers
PPTX
DPDK
PDF
Tcpdump ile Trafik Analizi(Sniffing)
PDF
Layout rack servidores
PDF
Comparativa Firewall: IPCop vs. pfSense
Session 2
A Software Defined WAN Architecture
Apache web server
Namespaces and cgroups - the basis of Linux containers
DPDK
Tcpdump ile Trafik Analizi(Sniffing)
Layout rack servidores
Comparativa Firewall: IPCop vs. pfSense

What's hot (12)

PDF
Mise en place de service FTP kalinux.pdf
PDF
Android Tools for Qualcomm Snapdragon Processors
PDF
Mise en place du Firewall IPCop
PDF
Vpn site to site avec les équipements JUNIPER
PDF
TechWiseTV Workshop: Cisco SD-WAN
PPT
Top 5 server performance problems and how to resolve them using OpManager
PDF
CisCon 2018 - Overlay Management Protocol e IPsec
PPTX
Palo alto NGfw2023.pptx
PDF
Redondance de routeur (hsrp, vrrp, glbp)
PDF
Aruba VIA 2.0 (Mac) User Guide
PDF
Switching intermedio
PDF
OpenSync: Open Source for Cloud to Device Enabled Services
Mise en place de service FTP kalinux.pdf
Android Tools for Qualcomm Snapdragon Processors
Mise en place du Firewall IPCop
Vpn site to site avec les équipements JUNIPER
TechWiseTV Workshop: Cisco SD-WAN
Top 5 server performance problems and how to resolve them using OpManager
CisCon 2018 - Overlay Management Protocol e IPsec
Palo alto NGfw2023.pptx
Redondance de routeur (hsrp, vrrp, glbp)
Aruba VIA 2.0 (Mac) User Guide
Switching intermedio
OpenSync: Open Source for Cloud to Device Enabled Services
Ad

Similar to “Deploying Deep Learning Applications on FPGAs with MATLAB,” a Presentation from MathWorks (20)

PDF
Matlab Deep Learning Hdl Toolbox Ug The Mathworks Inc
PDF
"Deploying Deep Learning Models on Embedded Processors for Autonomous Systems...
PDF
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
PDF
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...
PDF
INFN Advanced ML Hackaton 2022 Talk
PDF
Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...
 
PPTX
Projects MATLAB Research Guidance
PPTX
2022-09-14-MATLABDay_SREC.pptx
PDF
digitaldesign-s20-lecture3b-fpga-afterlecture.pdf
PPTX
FPGA Design for Embedded Systems BY Embedded Hash
PPTX
Introduction to FPGA acceleration
PDF
Fpga Implementations Of Neural Networks Amos R Omondi Jagath C Rajapakse
PPTX
Neurons On Wheels - Implementation
PDF
Distributed deep learning optimizations for Finance
PDF
Deep learning with FPGA
PDF
⭐⭐⭐⭐⭐ CHARLA #PUCESE: Industrial Automation and Internet of Things Based on O...
PDF
⭐⭐⭐⭐⭐ CHARLA FIEC: Monitoring of system memory usage embedded in #FPGA
PDF
Development of accelerators for ML and I(nference)aaS systems on FPGA
PDF
Running deep learning onto heterogenous hardware
PDF
TinyML: Machine Learning for Microcontrollers
Matlab Deep Learning Hdl Toolbox Ug The Mathworks Inc
"Deploying Deep Learning Models on Embedded Processors for Autonomous Systems...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...
INFN Advanced ML Hackaton 2022 Talk
Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...
 
Projects MATLAB Research Guidance
2022-09-14-MATLABDay_SREC.pptx
digitaldesign-s20-lecture3b-fpga-afterlecture.pdf
FPGA Design for Embedded Systems BY Embedded Hash
Introduction to FPGA acceleration
Fpga Implementations Of Neural Networks Amos R Omondi Jagath C Rajapakse
Neurons On Wheels - Implementation
Distributed deep learning optimizations for Finance
Deep learning with FPGA
⭐⭐⭐⭐⭐ CHARLA #PUCESE: Industrial Automation and Internet of Things Based on O...
⭐⭐⭐⭐⭐ CHARLA FIEC: Monitoring of system memory usage embedded in #FPGA
Development of accelerators for ML and I(nference)aaS systems on FPGA
Running deep learning onto heterogenous hardware
TinyML: Machine Learning for Microcontrollers
Ad

More from Edge AI and Vision Alliance (20)

PDF
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
PDF
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
PDF
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
PDF
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
PDF
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
PDF
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
PDF
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
PDF
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
PDF
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
PDF
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...

Recently uploaded (20)

PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPT
Teaching material agriculture food technology
PDF
Modernizing your data center with Dell and AMD
PDF
Empathic Computing: Creating Shared Understanding
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Advanced IT Governance
PDF
cuic standard and advanced reporting.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Dropbox Q2 2025 Financial Results & Investor Presentation
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Teaching material agriculture food technology
Modernizing your data center with Dell and AMD
Empathic Computing: Creating Shared Understanding
MYSQL Presentation for SQL database connectivity
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Spectral efficient network and resource selection model in 5G networks
20250228 LYD VKU AI Blended-Learning.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Advanced IT Governance
cuic standard and advanced reporting.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Advanced Soft Computing BINUS July 2025.pdf
Electronic commerce courselecture one. Pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm

“Deploying Deep Learning Applications on FPGAs with MATLAB,” a Presentation from MathWorks

  • 1. © 2020 MathWorks Deploying Deep Learning Application on FPGAs with MATLAB Jack Erickson Technical Marketing September 2020
  • 2. © 2020 MathWorks Airborne Image Analysis Deep Learning Deployment on Embedded Devices 2 Autonomous Driving Industrial Inspection Medical Image Analysis Wireless Modulation Classification Radar Signature Classification
  • 3. © 2020 MathWorks System Requirements Drive Network Design 3 Industrial Inspection Hardware/Software Engineers Systems Engineer Deep Learning Practitioner Camera specs Accuracy Latency Cost Power
  • 4. © 2020 MathWorks Challenges of Deploying Deep Learning to FPGA Hardware: Convolution Each stride is an 11x11x3 matrix multiply-accumulate →105M floating-point multiply operations! 55 55 96 filters stride=4 224 224 11x11 96 filters of 11x11x3 of 32-bit parameters →140k bytes 11x11 →1.16M bytes of activations
  • 5. © 2020 MathWorks Challenges of Deploying Deep Learning to FPGA Hardware conv 1 conv 2 conv 3 conv 4 conv 5 fc6 fc7 fc8input Total 140K 1.2M 3.5M 5.2M 1.8M 148M 64M 16M Parameters (Bytes) n/a 230 M 1.1M 728K 252K 252K 168K 16K 16K 4K Activations (Bytes) 588K 3.1 M 105M 223M 149M 112M 74M 37M 16M 4MFLOPs n/a 720 M Off-chip RAM Block RAM DSP Slices
  • 6. © 2020 MathWorks Deploying Deep Learning to FPGA Hardware Requires Collaboration 140K 1.2M 3.5M 5.2M 1.8M 148M 64M 16M Parameters (Bytes) conv 1 conv 2 conv 3 conv 4 conv 5 fc6 fc7 fc8 n/a input Total 230 M 1.1M 728K 252K 252K 168K 16K 16K 4K Activations (Bytes) 588K 3.1 M 105M 223M 149M 112M 74M 37M 16M 4MFLOPs n/a 720 M Resize Acquire data Output / display Mem i/f Optimize • Network /layers • Fixed-point quantization • Processor micro-architecture
  • 7. © 2020 MathWorks A Collaborative AI Workflow Model design and tuning Hardware accelerated training Interoperability AI Modeling Integration with complex systems System verification and validation System simulation System Design Data cleansing and preparation Simulation- generated data Human insight Data Preparation 7 Enterprise systems Embedded devices Edge, cloud, desktop Deployment I Iteration and Refinement
  • 8. © 2020 MathWorks Design and Analyze Your Networks in MATLAB Classification Learner app to try different classifiers and find the best fit for your data set Deep Network Designer app to build, visualize, and edit deep learning networks Model design and tuning Hardware accelerated training Interoperability AI Modeling 8
  • 9. © 2020 MathWorks 9 MATLAB Interoperates with Other AI Frameworks Caffe importer Keras importer Model design and tuning Hardware accelerated training Interoperability AI Modeling
  • 10. © 2020 MathWorks Application logic Deploy from MATLAB to a Variety of Hardware Platforms 10 FPGA CPU GPU Enterprise systems Embedded devices Edge, cloud, desktop Deployment
  • 11. © 2020 MathWorks FPGA Deployment from MATLAB
  • 12. © 2020 MathWorks Application logic Get Started Prototyping on FPGA with Deep Learning HDL ToolboxTM Hardware support package Deep learning processor with I/O and external memory interfaces • Int8 or single precision • Supported boards: • Xilinx: ZCU102 or ZC706 • Intel: Arria10 SoC • http://mathworks.com/hardware-support.html FPGA Bitstream Layer control instructions Weights & Activations Fully Connected Module Convolution Module Processor Control Memory Access Activations Activations Activations
  • 13. © 2020 MathWorks Defect Detection Example Application logic Pre-processing: Extract regions and resize Post-processing: Annotate and label Inference: Predict using trained network FPGA
  • 14. © 2020 MathWorks Run Deep Learning on FPGA from MATLAB 14
  • 15. © 2020 MathWorks Application logic Prototyping: Design Exploration and Customization 15 Layer control instructions Weights & Activations Re-train
  • 16. © 2020 MathWorks Design Exploration and Customization 16
  • 17. © 2020 MathWorks Optimizing Deep Learning Applications Requires Collaboration Systems Engineer Deep Learning Practitioner Hardware/Software Engineers Fully Connected Module Convolution Module Processor Control Memory Access Activations Activations Activations x +/- Σ 32 / x +/- Σ 8 / x +/- Σ 8 / x +/- Σ 8 / x +/- Σ 8 /
  • 18. © 2020 MathWorks INT8 Quantization 18
  • 19. © 2020 MathWorks Application logic Deep Learning HDL Toolbox 19 Layer control instructions Weights & Activations Re-train Modify network % Create target object hTarget = dlhdl.Target(…) % Create workflow object, using the target hW = dlhdl.Workflow(…); % Compile the network hW.compile; % Program the bitstream and deploy the compiled network and weights hW.deploy; % Run prediction [score, speed] = hW.predict(img, ‘Profile’, ‘on’);>> deepNetworkDesigner Parameters Speed 140 MB 18 fps 84 MB 45 fps 68 MB 139 fps Quantize >> deepNetworkQuantizer Generate HDL Iterate and Converge on Deep Learning FPGA Deployment from MATLAB
  • 20. © 2020 MathWorks Application logic Deploy from MATLAB to a Variety of Hardware Platforms 20 FPGA CPU GPU Tune for system requirements Prototype from MATLAB Configure and generate RTL Deep Learning HDL Toolbox
  • 21. © 2020 MathWorks Resource Slide Deep Learning Solutions in MATLAB https://www.mathworks.com/solutions/deep-learning.html Deep Learning HDL Toolbox https://www.mathworks.com/products/deep-learning-hdl.html Onramp: Deep Learning in MATLAB https://www.mathworks.com/learn/tutorials/deep-learning-onramp.html MathWorks FPGA Solutions Page https://www.mathworks.com/solutions/fpga-asic-soc-development.html 21