"Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power DSP," a Presentation from Google

Copyright © 2017 Google 1
Implementing the TensorFlow
Deep Learning Framework on
Qualcomm’s Low-power DSP
Pete Warden
May 2017

• Google’s open source library for machine intelligence
• tensorflow.org launched in Nov 2015
• Used by many production ML projects
2

TensorFlow and HVX

• Models run 8X faster, and use 1.4 watts versus ~5 watts on CPU
TensorFlow supports Qualcomm’s Hexagon DSP
Qualcomm Snapdragon 820 Processor
featuring the Hexagon DSP
DragonBoard 820c

• Started with my Embedded Vision Alliance talk last year
• “Eight bits are enough”
• Became clear from conversations with Qualcomm that there were
possibilities with their existing hardware in the Snapdragon 820
How did this happen?

• Qualcomm implemented a quick sanity test using gemmlowp, our
open source math library
• That demonstrated 100 GOPs/second on realistic workloads using
the HVX
• More than 5x speed of CPU
• Power usage expected to be much lower
Next Steps

• Gemmlowp project has m, n, k values for InceptionV1 matrix multiplies
Benchmark Details
https://github.com/google/gemmlowp/blob/master/test/benchmark.cc#L283

• Gemmlowp results indicated around 200 GOPs/s
(versus 25 GOPs/s on CPU)
• End to end turned out to be around 90 ms, versus 700 ms on CPU
• Was a good predictor of performance
Results
8

TensorFlow code is at
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/hvx
Qualcomm code is
https://source.codeaurora.org/quic/hexagon_nn/nnlib

• Works by assembling a batch of ops on the CPU
• Then sends them off to HVX via FastRPC
• HVX runs it within its own code loop
• Signals the AP when it's done

• TensorFlow handles splitting up the graph between HVX and CPU
• Same mechanism is available for other accelerators too

Embedded TensorFlow

We work closely with chip builders

• Examples
• ARM’s Compute Library
• Movidius’s mvTensor tool
• CEVA’s conversion tools
• Intel’s contributions to https://github.com/google/gemmlowp
• Qualcomm’s HVX collaboration
We work closely with chip builders

Why?

• Mobile App Developers (including Snapchat)
• Device builders
• Home
• Drones
• Industrial
• Medical
• Automotive
Lots of demand

• Full support for eight bit
• Full stack: researchers, data centers, mobile apps, embedded devices
• Main framework at Google
• Shipping for vision on many apps, including PhotoScan and Snapchat
What’s TensorFlow particularly good at?

• Support for eight-bit training
• On-device training (already being used by Google Keyboard)
• Better export pipeline (Graph Transform Tool)
• Raspberry Pi
• Jetson TX1 experimental support
• Other chips?
• Many more examples
Embedded TensorFlow Roadmap

• ARM and Intel added code to https://github.com/google/gemmlowp
• Worked with many others to support TensorFlow file format for conversion
pipelines
• We’re always open to conversations about our requirements and porting
Collaborations with hardware vendors

• TensorFlow hands-on training class from the Embedded Vision Alliance,
July 13 in Santa Clara
• We’re always looking for chips, tools, systems companies to collaborate
with
• Please get in touch!
• petewarden@google.com
Future

"Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power DSP," a Presentation from Google

More Related Content

What's hot (20)

Similar to "Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power DSP," a Presentation from Google (20)

More from Edge AI and Vision Alliance (20)

Recently uploaded (20)

"Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power DSP," a Presentation from Google