"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keynote Presentation from Google

Large-Scale Deep Learning for
Intelligent Computer Systems
Jeff Dean
Google Brain team
g.co/brain
In collaboration with many other people at Google

Building Intelligent Products
Really hard without understanding
Not there yet, but making significant progress

What do I mean by understanding?

Outline
● Why deep neural networks?
● Examples of using these for solving real problems
● TensorFlow: software infrastructure for our work (and yours!)
● What are different ways you can get started using these?

Google Brain project started in 2011, with a focus on
pushing state-of-the-art in neural networks. Initial
emphasis:
● use large datasets, and
● large amounts of computation
to push boundaries of what is possible in perception and
language understanding

Growing Use of Deep Learning at Google
Android
Apps
drug discovery
Gmail
Image understanding
Maps
Natural language
understanding
Photos
Robotics research
Speech
Translation
YouTube
… many others ...
Across many
products/areas:
# of directories containing model description files
Time
UniqueProjectDirectories

The promise (or wishful dream) of Deep Learning
Speech
Text
Search Queries
Images
Videos
Labels
Entities
Words
Audio
Features
Simple,
Reconfigurable,
High Capacity,
Trainable end-to-end
Building Blocks
Speech
Text
Search Queries
Images
Videos
Labels
Entities
Words
Audio
Features

The promise (or wishful dream) of Deep Learning
Common representations across domains.
Replacing piles of code with
data and simple learning algorithms.
Would merely be an interesting academic exercise…
…if it didn’t work so well!

Speech Recognition
Speech Recognition with Deep Recurrent Neural Networks
Alex Graves, Abdel-rahman Mohamed, Geoffrey Hinton
Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks
Tara N. Sainath, Oriol Vinyals, Andrew Senior, Hasim Sak
Object Recognition and Detection
Going Deeper with Convolutions
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed,
Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich
Scalable Object Detection using Deep Neural Networks
Dumitru Erhan, Christian Szegedy, Alexander Toshev, Dragomir Anguelov
In Research and Industry

In Research and Industry
Machine Translation
Sequence to Sequence Learning with Neural Networks
Ilya Sutskever, Oriol Vinyals, Quoc V. Le
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio
Language Modeling
One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling
Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, Tony Robinson
Parsing
Grammar as a Foreign Language
Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, Geoffrey Hinton

“cat”
● A powerful class of machine learning model
● Modern reincarnation of artificial neural networks
● Collection of simple, trainable mathematical functions
● Compatible with many variants of machine learning
What is Deep Learning?

“cat”
● Loosely based on
(what little) we know
about the brain
What is Deep Learning?

Learning algorithm
While not done:
Pick a random training example “(input, label)”
Run neural network on “input”
Adjust weights on edges to make output closer to “label”

Plenty of raw data
● Text: trillions of words of English + other languages
● Visual data: billions of images and videos
● Audio: tens of thousands of hours of speech per day
● User activity: queries, marking messages spam, etc.
● Knowledge graph: billions of labelled relation triples
● ...
How can we build systems that truly understand this data?

Important Property of Neural Networks
Results get better with
more data +
bigger models +
more computation
(Better algorithms, new insights and improved
techniques always help, too!)

What are some ways that
deep learning is having
a significant impact at Google?

“How cold is
it outside?”
Deep
Recurrent
Neural Network
Acoustic Input Text Output
Reduced word errors by more than 30%
Speech Recognition
Google Research Blog - August 2012, August 2015

ImageNet
Challenge
Given an image,
predict one of 1000
different classes
Image credit:
www.cs.toronto.
edu/~fritz/absps/imagene
t.pdf

The Inception Architecture (GoogLeNet, 2014)
Going Deeper with Convolutions
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov,
Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich
ArXiv 2014, CVPR 2015

Team Year Place Error (top-5)
XRCE (pre-neural-net explosion) 2011 1st 25.8%
Supervision (AlexNet) 2012 1st 16.4%
Clarifai 2013 1st 11.7%
GoogLeNet (Inception) 2014 1st 6.66%
Andrej Karpathy (human) 2014 N/A 5.1%
BN-Inception (Arxiv) 2015 N/A 4.9%
Inception-v3 (Arxiv) 2015 N/A 3.46%
Neural Nets: Rapid Progress in Image Recognition
ImageNet
challenge
classification
task

Good Fine-Grained Classification

Good Generalization
Both recognized as “meal”

“ocean”
Deep
Convolutional
Neural Network
Your Photo
Automatic Tag
Search personal photos without tags.
Google Photos Search
Google Research Blog - June 2013

"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keynote Presentation from Google

Image Captions
W __ A young girl
A young girl asleep[Vinyals et al., CVPR 2015]

Model: A close up of a child
holding a stuffed animal.
Human: A young girl asleep on
the sofa cuddling a stuffed
bear.
Model: A baby is asleep next to
a teddy bear.

What do you want in a machine learning system?
● Ease of expression: for lots of crazy ML ideas/algorithms
● Scalability: can run experiments quickly
● Portability: can run on wide variety of platforms
● Reproducibility: easy to share and reproduce research
● Production readiness: go from research to real products

TensorFlow:
Second Generation Deep Learning System

http://tensorflow.org/
and
https://github.com/tensorflow/tensorflow
If we like it, wouldn’t the rest of the world like it, too?
Open sourced single-machine TensorFlow on Monday, Nov. 9th, 2015
● Flexible Apache 2.0 open source licensing
● Updates for distributed implementation coming soon

http://tensorflow.org/

http://tensorflow.org/whitepaper2015.pdf

https://github.com/tensorflow/tensorflow
Source on GitHub

Motivations
DistBelief (1st system) was great for scalability, and
production training of basic kinds of models
Not as flexible as we wanted for research purposes
Better understanding of problem space allowed us to
make some dramatic simplifications

TensorFlow: Expressing High-Level ML Computations
● Core in C++
○ Very low overhead
Core TensorFlow Execution System
CPU GPU Android iOS ...

● Core in C++
● Different front ends for specifying/driving the computation
○ Python and C++ today, easy to add more

● Core in C++
● Different front ends for specifying/driving the computation
○ Python and C++ today, easy to add more
C++ front end Python front end ...

MatMul
Add Relu
biases
weights
examples
labels
Xent
Graph of Nodes, also called Operations or ops.
Computation is a dataflow graph

with tensors
MatMul
Add Relu
biases
weights
examples
labels
Xent
Edges are N-dimensional arrays: Tensors

with state
Add Mul
biases
...
learning rate
−=...
'Biases' is a variable −= updates biasesSome ops compute gradients

Device A Device B
distributed
Add Mul
biases
...
learning rate
−=...
Devices: Processes, Machines, GPUs, etc

Automatically runs models on range of platforms:
from phones ...
to single machines (CPU and/or GPUs) …
to distributed systems of many 100s of GPU cards

Mobile and Embedded Deployment Desires
● Execution efficiency
● Low power consumption
● Modest size requirements

Quantization: Using Low Precision Integer Math
● Train using 32-bit floats, and after training, convert
parameters to quantized 8-bit integer representations
● We have used this in many different applications:
○ Very minor losses in overall model accuracy. E.g.:
■ ~77% top-1 accuracy for image model with 32-bit float,
~76% top-1 accuracy with 8-bit quantized integers
○ 8-bit math gives close to 4X speedup and 4X reduction in model size
○ Saves considerable power, as well

gemmlowp
Support for this open-sourced in gemmlowp package:
https://github.com/google/gemmlowp
Efficient GEMM implementations for ARM and x86
Ongoing performance work to make it even better

TensorFlow and Quantized Models
Support for quantized integer kernels
Automated tool coming shortly to
tensorflow/contrib/quantization/quantize_graph
Converts TensorFlow model/graph trained using 32-bit floats
● Emits new graph and associated parameter checkpoint
● Uses quantized ops where equivalent ops exist
● Falls back to float when no equivalent quantized op exists

TensorFlow and Mobile Execution
● Android support already there
○ Example app in TensorFlow GitHub repository under:
■ tensorflow/examples/android/...
● iOS support coming shortly:
○ https://github.
com/tensorflow/tensorflow/issues/16
○ https://github.com/tensorflow/tensorflow/pull/1631

To Learn More
● Attend Pete Warden’s talk tomorrow (Tuesday, May 3),
10:30 to 11:15 AM
“TensorFlow: Enabling Mobile and Embedded Machine Intelligence”
http://www.embedded-vision.com/summit/tensorflow-enabling-mobile-and-embedded-machine-intelligence

How Can You Get Started with Machine Learning?
Four ways, with varying complexity:
(1) Use a Cloud-based API (Vision, Speech, etc.)
(2) Run your own pretrained model
(3) Use an existing model architecture, and
retrain it or fine tune on your dataset
(4) Develop your own machine learning models
for new problems
More
flexible,
but more
effort
required

(1) Use Cloud-based APIs
cloud.google.com/translate
cloud.google.com/speech
cloud.google.com/vision
cloud.google.com/text

Google Cloud Vision API
https://cloud.google.com/vision/

(2) Using a Pre-trained Image Model with TensorFlow
www.tensorflow.org/tutorials/image_recognition/index.html

Using a Pre-trained Image Model with TensorFlow on Android
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android

(3) Training a Model on Your Own Image Data
www.tensorflow.org/versions/master/how_tos/image_retraining/index.html

(4) Develop your own machine learning models
https://www.tensorflow.org/versions/master/get_started/basic_usage.html

What Does the Future Hold?
Deep learning usage will continue to grow and accelerate:
● Across more and more fields and problems:
○ robotics, self-driving vehicles, ...
○ health care
○ video understanding
○ dialogue systems
○ personal assistance
○ ...

Combining Vision with Robotics
“Deep Learning for Robots: Learning
from Large-Scale Interaction”, Google
Research Blog, March, 2016
“Learning Hand-Eye Coordination for
Robotic Grasping with Deep Learning
and Large-Scale Data Collection”,
Sergey Levine, Peter Pastor, Alex
Krizhevsky, & Deirdre Quillen,
Arxiv, arxiv.org/abs/1603.02199

Conclusions
Deep neural networks are making significant strides in understanding:
In speech, vision, language, search, …
If you’re not considering how to use deep neural nets to solve your vision or
understanding problems, you almost certainly should be
Pre-trained models or pre-trained APIs are a low overhead way of starting to
explore
TensorFlow makes it easy for everyone to experiment with these techniques
● Highly scalable design allows faster experiments, accelerates research
● Easy to share models and to publish code to give reproducible results
● Ability to go from research to production within same system

Further Reading
● Le, Ranzato, Monga, Devin, Chen, Corrado, Dean, & Ng. Building High-Level Features
Using Large Scale Unsupervised Learning, ICML 2012. research.google.
com/archive/unsupervised_icml2012.html
● Dean, et al., Large Scale Distributed Deep Networks, NIPS 2012, research.google.
com/archive/large_deep_networks_nips2012.html.
● Sutskever, Vinyals, & Le, Sequence to Sequence Learning with Neural Networks, NIPS,
2014, arxiv.org/abs/1409.3215.
● Vinyals, Toshev, Bengio, & Erhan. Show and Tell: A Neural Image Caption Generator.
CVPR 2015. arxiv.org/abs/1411.4555
● Szegedy, Vanhoucke, Ioffe, Shlens, & Wojna. Rethinking the Inception Architecture for
Computer Vision. arxiv.org/abs/1512.00567
● TensorFlow white paper, tensorflow.org/whitepaper2015.pdf (clickable links in bibliography)
research.google.com/people/jeff
research.google.com/pubs/MachineIntelligence.html
Questions?

"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keynote Presentation from Google

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to "Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keynote Presentation from Google (20)

More from Edge AI and Vision Alliance (20)

Recently uploaded (20)

"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keynote Presentation from Google