SlideShare a Scribd company logo
Copyright © 2017 Google 1
Implementing the TensorFlow
Deep Learning Framework on
Qualcomm’s Low-power DSP
Pete Warden
May 2017
Copyright © 2017 Google 2
• Google’s open source library for machine intelligence
• tensorflow.org launched in Nov 2015
• Used by many production ML projects
2
Copyright © 2017 Google 3
TensorFlow and HVX
Copyright © 2017 Google 4
• Models run 8X faster, and use 1.4 watts versus ~5 watts on CPU
TensorFlow supports Qualcomm’s Hexagon DSP
Qualcomm Snapdragon 820 Processor
featuring the Hexagon DSP
DragonBoard 820c
Copyright © 2017 Google 5
• Started with my Embedded Vision Alliance talk last year
• “Eight bits are enough”
• Became clear from conversations with Qualcomm that there were
possibilities with their existing hardware in the Snapdragon 820
How did this happen?
Copyright © 2017 Google 6
• Qualcomm implemented a quick sanity test using gemmlowp, our
open source math library
• That demonstrated 100 GOPs/second on realistic workloads using
the HVX
• More than 5x speed of CPU
• Power usage expected to be much lower
Next Steps
Copyright © 2017 Google 7
• Gemmlowp project has m, n, k values for InceptionV1 matrix multiplies
Benchmark Details
https://github.com/google/gemmlowp/blob/master/test/benchmark.cc#L283
Copyright © 2017 Google 8
• Gemmlowp results indicated around 200 GOPs/s
(versus 25 GOPs/s on CPU)
• End to end turned out to be around 90 ms, versus 700 ms on CPU
• Was a good predictor of performance
Results
8
Copyright © 2017 Google 9
TensorFlow code is at
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/hvx
Qualcomm code is
https://source.codeaurora.org/quic/hexagon_nn/nnlib
Copyright © 2017 Google 10
• Works by assembling a batch of ops on the CPU
• Then sends them off to HVX via FastRPC
• HVX runs it within its own code loop
• Signals the AP when it's done
Copyright © 2017 Google 11
• TensorFlow handles splitting up the graph between HVX and CPU
• Same mechanism is available for other accelerators too
Copyright © 2017 Google 12
Copyright © 2017 Google 13
Copyright © 2017 Google 14
Copyright © 2017 Google 15
Embedded TensorFlow
Copyright © 2017 Google 16
We work closely with chip builders
Copyright © 2017 Google 17
• Examples
• ARM’s Compute Library
• Movidius’s mvTensor tool
• CEVA’s conversion tools
• Intel’s contributions to https://github.com/google/gemmlowp
• Qualcomm’s HVX collaboration
We work closely with chip builders
Copyright © 2017 Google 18
Why?
Copyright © 2017 Google 19
• Mobile App Developers (including Snapchat)
• Device builders
• Home
• Drones
• Industrial
• Medical
• Automotive
Lots of demand
Copyright © 2017 Google 20
• Full support for eight bit
• Full stack: researchers, data centers, mobile apps, embedded devices
• Main framework at Google
• Shipping for vision on many apps, including PhotoScan and Snapchat
What’s TensorFlow particularly good at?
Copyright © 2017 Google 21
• Support for eight-bit training
• On-device training (already being used by Google Keyboard)
• Better export pipeline (Graph Transform Tool)
• Raspberry Pi
• Jetson TX1 experimental support
• Other chips?
• Many more examples
Embedded TensorFlow Roadmap
Copyright © 2017 Google 22
• ARM and Intel added code to https://github.com/google/gemmlowp
• Worked with many others to support TensorFlow file format for conversion
pipelines
• We’re always open to conversations about our requirements and porting
Collaborations with hardware vendors
Copyright © 2017 Google 23
• TensorFlow hands-on training class from the Embedded Vision Alliance,
July 13 in Santa Clara
• We’re always looking for chips, tools, systems companies to collaborate
with
• Please get in touch!
• petewarden@google.com
Future

More Related Content

PDF
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
PDF
"Performing Multiple Perceptual Tasks With a Single Deep Neural Network," a P...
PDF
High Performance Computing
PDF
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
PDF
"Designing a Stereo IP Camera From Scratch," a Presentation from ELVEES
PDF
"How to Test and Validate an Automated Driving System," a Presentation from M...
PDF
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
PPTX
Ai meetup 3_25_2018_penguin
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Performing Multiple Perceptual Tasks With a Single Deep Neural Network," a P...
High Performance Computing
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Designing a Stereo IP Camera From Scratch," a Presentation from ELVEES
"How to Test and Validate an Automated Driving System," a Presentation from M...
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
Ai meetup 3_25_2018_penguin

What's hot (20)

PDF
"Developing Real-time Video Applications with CoaXPress," A Presentation from...
PPTX
Webinar: Three Reasons Why NAS is No Good for AI and Machine Learning
PDF
“TinyML Isn’t Thinking Big Enough,” a Presentation from Perceive
PDF
TensorFlow London: Cutting edge generative models
PDF
Metaflow: The ML Infrastructure at Netflix
PDF
Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16
PDF
WekaIO: Making Machine Learning Compute Bound Again
PDF
Very large scale distributed deep learning on BigDL
PPTX
High performance computing for research
PPTX
Welcome to the 2018 Stanford HPC Conference
PDF
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
ODP
Self driving computers active learning workflows with human interpretable ve...
PDF
The Pandemic Changes Everything, the Need for Speed and Resiliency
PPTX
Video Analytics on Hadoop webinar victor fang-201309
PDF
Industrial production process visualization with the Elastic Stack in real-ti...
PDF
IBM Middle East Data Science Connect 2016 - Doha, Qatar
PDF
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
PDF
Data Tells the Story - Greenplum Summit 2018
PDF
Apache SystemML - Declarative Large-Scale Machine Learning
PPTX
The Power of DataOps for Cloud and Digital Transformation
"Developing Real-time Video Applications with CoaXPress," A Presentation from...
Webinar: Three Reasons Why NAS is No Good for AI and Machine Learning
“TinyML Isn’t Thinking Big Enough,” a Presentation from Perceive
TensorFlow London: Cutting edge generative models
Metaflow: The ML Infrastructure at Netflix
Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16
WekaIO: Making Machine Learning Compute Bound Again
Very large scale distributed deep learning on BigDL
High performance computing for research
Welcome to the 2018 Stanford HPC Conference
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Self driving computers active learning workflows with human interpretable ve...
The Pandemic Changes Everything, the Need for Speed and Resiliency
Video Analytics on Hadoop webinar victor fang-201309
Industrial production process visualization with the Elastic Stack in real-ti...
IBM Middle East Data Science Connect 2016 - Doha, Qatar
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
Data Tells the Story - Greenplum Summit 2018
Apache SystemML - Declarative Large-Scale Machine Learning
The Power of DataOps for Cloud and Digital Transformation
Ad

Similar to "Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power DSP," a Presentation from Google (20)

PPTX
DevOpsDays 2018 - Migrating a Cloud Native App to k8s
PDF
Jenkins Pipeline @ Scale. Building Automation Frameworks for Systems Integration
PPTX
How bigtop leveraged docker for build automation and one click hadoop provis...
PDF
Google Tech Talk with Dr. Eric Brewer in Korea Apr.27.2015
PDF
Building a Distributed & Automated Open Source Program at Netflix
PDF
Netflix Open Source: Building a Distributed and Automated Open Source Program
PPTX
Hadoop training in mumbai
PPT
LNUG: Having Your Node.js Cake and Eating It Too
PPTX
Machine Learning Standards
PDF
Portfolio
PPT
Google does containers: Hello Kubernetes - Steve Wong and Vladimir Vivien - D...
PPTX
New DevOps for the DBA
PPTX
Storage for containers and cloud-native deployments - Rancher Online Meetup -...
PDF
Continuous delivery with jenkins pipelines (@WeAreDevelopers2017)
PPTX
Distributed tensorflow on kubernetes
PPTX
Distributed tensorflow on kubernetes
PPTX
Rootconf 2017 - State of the Open Source monitoring landscape
PPTX
Azure_DevOps_Customer1212121_201903.pptx
PPTX
Webinar: End-to-End CI/CD with GitLab and DC/OS
PDF
vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28
DevOpsDays 2018 - Migrating a Cloud Native App to k8s
Jenkins Pipeline @ Scale. Building Automation Frameworks for Systems Integration
How bigtop leveraged docker for build automation and one click hadoop provis...
Google Tech Talk with Dr. Eric Brewer in Korea Apr.27.2015
Building a Distributed & Automated Open Source Program at Netflix
Netflix Open Source: Building a Distributed and Automated Open Source Program
Hadoop training in mumbai
LNUG: Having Your Node.js Cake and Eating It Too
Machine Learning Standards
Portfolio
Google does containers: Hello Kubernetes - Steve Wong and Vladimir Vivien - D...
New DevOps for the DBA
Storage for containers and cloud-native deployments - Rancher Online Meetup -...
Continuous delivery with jenkins pipelines (@WeAreDevelopers2017)
Distributed tensorflow on kubernetes
Distributed tensorflow on kubernetes
Rootconf 2017 - State of the Open Source monitoring landscape
Azure_DevOps_Customer1212121_201903.pptx
Webinar: End-to-End CI/CD with GitLab and DC/OS
vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28
Ad

More from Edge AI and Vision Alliance (20)

PDF
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
PDF
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
PDF
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
PDF
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
PDF
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
PDF
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
PDF
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
PDF
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
PDF
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
PDF
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Chapter 2 Digital Image Fundamentals.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
cuic standard and advanced reporting.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
MYSQL Presentation for SQL database connectivity
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
“AI and Expert System Decision Support & Business Intelligence Systems”
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
Advanced methodologies resolving dimensionality complications for autism neur...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Understanding_Digital_Forensics_Presentation.pptx
Chapter 2 Digital Image Fundamentals.pdf
NewMind AI Monthly Chronicles - July 2025
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Spectral efficient network and resource selection model in 5G networks
cuic standard and advanced reporting.pdf

"Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power DSP," a Presentation from Google

  • 1. Copyright © 2017 Google 1 Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power DSP Pete Warden May 2017
  • 2. Copyright © 2017 Google 2 • Google’s open source library for machine intelligence • tensorflow.org launched in Nov 2015 • Used by many production ML projects 2
  • 3. Copyright © 2017 Google 3 TensorFlow and HVX
  • 4. Copyright © 2017 Google 4 • Models run 8X faster, and use 1.4 watts versus ~5 watts on CPU TensorFlow supports Qualcomm’s Hexagon DSP Qualcomm Snapdragon 820 Processor featuring the Hexagon DSP DragonBoard 820c
  • 5. Copyright © 2017 Google 5 • Started with my Embedded Vision Alliance talk last year • “Eight bits are enough” • Became clear from conversations with Qualcomm that there were possibilities with their existing hardware in the Snapdragon 820 How did this happen?
  • 6. Copyright © 2017 Google 6 • Qualcomm implemented a quick sanity test using gemmlowp, our open source math library • That demonstrated 100 GOPs/second on realistic workloads using the HVX • More than 5x speed of CPU • Power usage expected to be much lower Next Steps
  • 7. Copyright © 2017 Google 7 • Gemmlowp project has m, n, k values for InceptionV1 matrix multiplies Benchmark Details https://github.com/google/gemmlowp/blob/master/test/benchmark.cc#L283
  • 8. Copyright © 2017 Google 8 • Gemmlowp results indicated around 200 GOPs/s (versus 25 GOPs/s on CPU) • End to end turned out to be around 90 ms, versus 700 ms on CPU • Was a good predictor of performance Results 8
  • 9. Copyright © 2017 Google 9 TensorFlow code is at https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/hvx Qualcomm code is https://source.codeaurora.org/quic/hexagon_nn/nnlib
  • 10. Copyright © 2017 Google 10 • Works by assembling a batch of ops on the CPU • Then sends them off to HVX via FastRPC • HVX runs it within its own code loop • Signals the AP when it's done
  • 11. Copyright © 2017 Google 11 • TensorFlow handles splitting up the graph between HVX and CPU • Same mechanism is available for other accelerators too
  • 12. Copyright © 2017 Google 12
  • 13. Copyright © 2017 Google 13
  • 14. Copyright © 2017 Google 14
  • 15. Copyright © 2017 Google 15 Embedded TensorFlow
  • 16. Copyright © 2017 Google 16 We work closely with chip builders
  • 17. Copyright © 2017 Google 17 • Examples • ARM’s Compute Library • Movidius’s mvTensor tool • CEVA’s conversion tools • Intel’s contributions to https://github.com/google/gemmlowp • Qualcomm’s HVX collaboration We work closely with chip builders
  • 18. Copyright © 2017 Google 18 Why?
  • 19. Copyright © 2017 Google 19 • Mobile App Developers (including Snapchat) • Device builders • Home • Drones • Industrial • Medical • Automotive Lots of demand
  • 20. Copyright © 2017 Google 20 • Full support for eight bit • Full stack: researchers, data centers, mobile apps, embedded devices • Main framework at Google • Shipping for vision on many apps, including PhotoScan and Snapchat What’s TensorFlow particularly good at?
  • 21. Copyright © 2017 Google 21 • Support for eight-bit training • On-device training (already being used by Google Keyboard) • Better export pipeline (Graph Transform Tool) • Raspberry Pi • Jetson TX1 experimental support • Other chips? • Many more examples Embedded TensorFlow Roadmap
  • 22. Copyright © 2017 Google 22 • ARM and Intel added code to https://github.com/google/gemmlowp • Worked with many others to support TensorFlow file format for conversion pipelines • We’re always open to conversations about our requirements and porting Collaborations with hardware vendors
  • 23. Copyright © 2017 Google 23 • TensorFlow hands-on training class from the Embedded Vision Alliance, July 13 in Santa Clara • We’re always looking for chips, tools, systems companies to collaborate with • Please get in touch! • [email protected] Future