Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang

INTRODUCTION TO
MULTI-GPU DEEP LEARNING
WITH DIGITS 2
Mike Wang
NVIDIA
PAPIs.io | Sydney | 6-7 August 2015

2
1 What is Deep Learning?
2 GPUs and Deep Learning
3 NVIDIA DIGITS
AGENDA

4
Deep Learning has become the most popular
approach to developing Artificial Intelligence
(AI) – machines that perceive and understand
the world
The focus is currently on specific perceptual
tasks, and there are many successes.
Today, some of the world’s largest internet
companies, as well as the foremost research
institutions, are using GPUs for deep learning
in research and production
DEEP LEARNING & AI
CUDA for
Deep Learning

5
PRACTICAL DEEP LEARNING EXAMPLES
Image Classification, Object Detection, Localization,
Action Recognition, Scene Understanding
Speech Recognition, Speech Translation,
Natural Language Processing
Pedestrian Detection, Traffic Sign Recognition
Breast Cancer Cell Mitosis Detection,
Volumetric Brain Image Segmentation

6
TRADITIONAL MACHINE PERCEPTION
– HAND TUNED FEATURES
Speaker ID,
speech transcription, …
Topic classification,
machine translation,
sentiment analysis…
Raw data Feature extraction Result
Classifier/
detector
SVM,
shallow neural net,
…
HMM,
shallow neural net,
…
Clustering, HMM,
LDA, LSA
…

7
DEEP LEARNING APPROACH
Train:
Deploy:
Dog
Cat
Honey badger
Errors
Dog
Cat
Raccoon
Dog

8
SOME DEEP LEARNING USE CASES
Jeff Dean, Google, GTC 2015

10
ARTIFICIAL NEURAL NETWORK (ANN)
A collection of simple, trainable mathematical units that
collectively learn complex functions
From Stanford cs231n lecture notes
Biological neuron
w1 w2 w3
x1 x2 x3
y
y=F(w1x1+w2x2+w3x3)
F(x)=max(0,x)
Artificial neuron

11
ARTIFICIAL NEURAL NETWORK (ANN)
A collection of simple, trainable mathematical units that
collectively learn complex functions
Input layer Output layer
Hidden layers
Given sufficient training data an artificial neural network can approximate very complex
functions mapping raw data to output decisions

12
DEEP NEURAL NETWORK (DNN)
Input Result
Application components:
Task objective
e.g. Identify face
Training data
10-100M images
Network architecture
~10 layers
1B parameters
Learning algorithm
~30 Exaflops
~30 GPU days
Raw data Low-level features Mid-level features High-level features

13
DEEP LEARNING ADVANTAGES
 Robust
 No need to design the features ahead of time – features are automatically
learned to be optimal for the task at hand
 Robustness to natural variations in the data is automatically learned
 Generalizable
 The same neural net approach can be used for many different applications
and data types
 Scalable
 Performance improves with more data, method is massively parallelizable

14
CONVOLUTIONAL NEURAL NETWORK (CNN)
Inspired by the human visual
cortex
Learns a hierarchy of visual
features
Local pixel level features are
scale and translation invariant
Learns the “essence” of visual
objects and generalizes well

15
CONVOLUTIONAL NEURAL NETWORK (CNN)

16
DNNS DOMINATE IN PERCEPTUAL TASKS
Slide credit: Yann Lecun, Facebook & NYU

17
WHY IS DEEP LEARNING HOT NOW?
Big Data Availability New DL Techniques GPU acceleration
350 millions
images uploaded
per day
2.5 Petabytes of
customer data
hourly
100 hours of video
uploaded every
minute
Three Driving Factors…

19
GPUs — THE PLATFORM FOR DEEP LEARNING
1.2M training images • 1000 object categories
Hosted by
Image Recognition Challenge
4
60
110
0
20
40
60
80
100
120
2010 2011 2012 2013 2014
GPU Entries
bird
frog
person
hammer
flower pot
power drill
person
car
helmet
motorcycle
person
dog
chair

20
GPU-ACCELERATED DEEP LEARNING

21
Deep learning with COTS HPC
systems
A. Coates, B. Huval, T. Wang, D. Wu,
A. Ng, B. Catanzaro
ICML 2013
GOOGLE DATACENTER
1,000 CPU Servers
2,000 CPUs • 16,000 cores
600 kWatts
$5,000,000
STANFORD AI LAB
3 GPU-Accelerated Servers
12 GPUs • 18,432 cores
4 kWatts
$33,000
Now You Can Build Google’s
$1M Artificial Brain on the Cheap
“ “
GPUS MAKE DEEP LEARNING ACCESSIBLE

22
WHY ARE GPUs GOOD FOR DEEP LEARNING?
GPUs deliver --
- same or better prediction accuracy
- faster results
- smaller footprint
- lower power
- lower cost
Neural
Networks
GPUs
Inherently
Parallel  
Matrix
Operations  
FLOPS  
Bandwidth  
[Lee, Ranganath & Ng, 2007]

23
DL software landscape
NVIDIA DIGITS

24
HOW TO WRITE APPLICATIONS USING DL
Hardware – Which can accelerate DL building blocks
System Software(Drivers)
Libraries(Key compute intensive commonly used building blocks)
Deep Learning Frameworks(Industry standard or research frameworks)
END USER APPLICATIONS
Speech
Understanding
Image
Analysis
Language
Processing

25
HOW NVIDIA IS HELPING DL STACK
Hardware – Which can accelerate DL building blocks
System Software(Drivers)
Libraries(Key compute intensive commonly used building blocks)
Deep Learning Frameworks(Industry standard or research frameworks)
END USER APPLICATIONS
GPU- World’s best DL Hardware
CUDA- Best Parallel Programming Toolkit
Performance libraries (cuDNN, cuBLAS)- Highly optimized
GPU accelerated DL Frameworks (Caffe, Torch, Theano)
DIGITS
Speech
Understanding
Image
Analysis
Language
Processing

26
CUDNN V2 - PERFORMANCE
v3 RC available to Registered Developers
CPU is 16 core Haswell E5-2698 at 2.3 GHz, with 3.6 GHz Turbo
GPU is NVIDIA Titan X

27
HOW GPU ACCELERATION WORKS
Application Code
+
GPU CPU
5% of Code
Compute-Intensive Functions
Rest of Sequential
CPU Code
~ 80% of run-time

28
DIGITS
DEEP GPU TRAINING
SYSTEM FOR DATA
SCIENTISTS
Design DNNs
Visualize activations
Manage multiple trainingsGPUGPU HW Cloud
GPU
Cluster
Multi-GPU
USER
INTERFACE
Visualize
Layers
Configure
DNN
Process
Data
Monitor
Progress
Theano
Torch
Caffe
cuDNN, cuBLAS
CUDA

29
DIGITS
Interactive Deep Learning GPU Training System
Data Scientists & Researchers:
Quickly design the best deep neural
network (DNN) for your data
Visually monitor DNN training quality in
real-time
Manage training of many DNNs in
parallel on multi-GPU systems
DIGITS 2 - Accelerate training of a
single DNN using multiple GPUs
https://developer.nvidia.com/digits

30
DIGITS WEB INTERFACE & API DRIVEN
Test Image
Monitor ProgressConfigure DNNProcess Data Visualize Layers

31
NVIDIA DIGITS
Training Speedup Achieved with DIGITS on Multiple GeForce TITAN X GPUs in a DIGITS
DevBox. These results were obtained with the Caffe framework and a batch size of 128.
Possible speed up with
multiple GPUs

33
DEEP LEARNING DEPLOYMENT WORKFLOW

34
DEEP LEARNING LAB SERIES SCHEDULE
 7/22 Class #1 - Introduction to Deep Learning
 7/29 Office Hours for Class #1
 8/5 Class #2 - Getting Started with DIGITS interactive training system for image classification
 8/19 Class #3 - Getting Started with the Caffe Framework
 9/2 Class #4 - Getting Started with the Theano Framework
 9/16 Class #5 - Getting Started with the Torch Framework
 More information available at developer.nvidia.com/deep-learning-courses
Recordings
online

35
HANDS-ON LAB
1. Create an account at nvidia.qwiklab.com
2. Go to “Introduction to Deep Learning” lab at bit.ly/dlnvlab1
3. Start the lab and enjoy!
 Only requires a supported browser, no NVIDIA GPU necessary!
 Lab is free until end of Deep Learning Lab series

36
USEFUL LINKS
 Deep Learning Lab Course information & recordings:
developer.nvidia.com/deep-learning-courses
 Recorded presentations from past conferences:
www.gputechconf.com/gtcnew/on-demand-gtc.php
 Parallel Forall (GPU Computing Technical blog):
devblogs.nvidia.com/parallelforall
 Become a Registered Developer:
developer.nvidia.com/programs/cuda/register

Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang

More Related Content

What's hot (20)

Similar to Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang (20)

More from PAPIs.io (20)

Recently uploaded (20)

Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang