Big Things Conference 2019 - Distributed Deep Learning with Keras/TensorFlow on Apache Spark

DISTRIBUTED DEEP
LEARNING WITH KERAS AND
TENSORFLOW ON APACHE
SPARK:YES,YOU CAN!
GUGLIELMO IOZZIA
MSD
MADRID, NOVEMBER 21ST 2019
#guglielmoiozzia

ABOUT ME
Currently at
Previously at
I got some awards lately Author I love cooking
DataOps
Champion
#guglielmoiozzia

MSD IRELAND
+ 50 years
Approx. 2,000 employees
$2.5 billion investment to date
Approx 50% MSD’s top 20 products manufactured here
Export to + 60 countries
€6.1 billion turnover in 2017
2017 + 300 jobs & €280m investment
MSD Biotech, Dublin, coming in 2021
https://www.msd-ireland.com/

CORE TOPICS
• What is it?Deep Learning
• 2 of the most popular frameworks for DLKeras and Tensorflow
• Why is it so difficult?
Why Distributed Deep
Learning on Spark?
• Why and How?
DL in Python on the
JVM

DEEP LEARNING
It is a subset of Machine
Learning which is based on
Multilayer Neural Networks

DEEP LEARNING
http://www.asimovinstitute.org/wp-content/uploads/2019/04/NeuralNetworkZoo20042019.png

TENSORFLOW
It is an end-to-end open source
platform for ML. It has a
comprehensive, flexible
ecosystem of tools, libraries and
community resources for
researchers and developers.
https://www.tensorflow.org/

KERAS
Keras is a high-level neural
networks API, written in Python
and capable of running on top of
TensorFlow, CNTK, or Theano.
It allows for easy prototyping
and runs seamlessly on CPUs
and GPUs.
https://keras.io/

KERAS & TENSORFLOW
Starting from TensorFlow r1.14

Speed
It achieves high performance for
both batch and streaming data,
using a state-of-the-art DAG
scheduler, a query optimizer, and a
physical execution engine.
Ease of Use
It offers over 80 high-level
operators that make it easy to build
parallel apps. And you can use it
interactively from the Scala,
Python, R, and SQL shells.
Generality
Combine SQL, streaming, and
complex analytics.
Runs Everywhere
It runs on Hadoop, Apache
Mesos, Kubernetes,
standalone, or in the cloud. It
can access diverse data
sources.

WHEN WOULDYOU NEED TO TRAIN
MNNS IN SPARK
• Availability of a cluster of machines for training
• Scarcity of GPUs
• Networks very large
• Huge data sets
By the way, DL4J isn’t for Spark only: you can use it on a single machine
with multiple GPUs or multiple physical processors.

CHALLENGES OF TRAINING MNNS
IN SPARK
• Different execution models between Spark and the DL frameworks
• GPU configuration and management
• Performance
• Accuracy

WHY DISTRIBUTED DL ON THE JVM?

DEEPLEARNING4J
It is an Open Source,
distributed, Deep Learning
framework written for JVM
languages.
It is integrated with
Hadoop and Apache
Spark.
It can be used on
distributed GPUs and
CPUs.

WHY DISTRIBUTED DL ON THE JVM?
TensorFlow

DL4J MODULES
• DataVec
• Arbiter
• NN
• Datasets
• RL4J
• DL4J-Spark
• Model Import
• ND4J
It is an Open Source linear algebra
and matrix manipulation library which
supports n-dimensional arrays and it
is integrated with Apache Hadoop
and Spark.

DL4J + APACHE SPARK
• DL4J provides high level API to design, configure train and evaluate
MNNs.
• Spark performances are excellent in particular for ETL/streaming, but
in terms of computation, in a MNN training context, some data
transformation/aggregation needs to be done using a low-level
language.
• DL4J uses ND4J, which is a C++ library that provides high level Scala
API to developers.

MODEL IMPORT IN DL4J
Keras TensorFlow
Train the Model
Save it as .h5
Load Model and
Weights
Load New Data
Predict
Train the Model
Save it as .pb
Load Model and
Weights
Load New Data
Predict
KerasModelImport
TFGraphMapper
Transfer Learning

MODEL IMPORT IN DL4J
Keras TensorFlow
Train the Model
Save it as .h5
Load Model and
Weights
Load New Data
Predict
Train the Model
Save it as .pb
Load Model and
Weights
Load New Data
Predict

KERAS MODEL IMPORT: SUPPORTED
FEATURES
• Layers
• Losses
• Activations
• Initializers
• Regularizers
• Constraints
• Metrics
• Optimizers

MODEL IMPORT IN DL4J: EXAMPLE
Keras
Train the Model
Save it as .h5
Load Model and
Weights
Load New Data
Predict
Import the VGG16
Model.
Test it.

MODEL IMPORT IN DL4J: EXAMPLE
Keras
Train the Model
Save it as .h5
Load Model and
Weights
Load New Data
Predict

DATA PARALLELISM AND MODEL
PARALLELISM

HOW TRAINING HAPPENS IN SPARK
WITH DL4J
Parameter Averaging
(DL4J 1.0.0-alpha)
Asynchronous SDG
(DL4J 1.0.0-beta+)

HOW TRAINING HAPPENS IN SPARK
WITH DL4J
The key classes users should be familiar with to get started with distributed
training in DL4J are:
• TrainingMaster: It specifies how distributed training will be conducted in
practice. Implementations include Gradient Sharing or Parameter Averaging .
• SparkDl4jMultiLayer and SparkComputationGraph: They are wrappers
around the MultiLayerNetwork and ComputationGraph classes in DL4J that
enable the functionality related to distributed training.
• RDD<DataSet> and RDD<MultiDataSet>: Spark RDDs with DL4J’s
DataSet or MultiDataSet classes that define the source of the training or
evaluation data.

RE-TRAIN AN IMPORTED MODEL
Define the Spark Context
Choose the TrainingMaster implementation
Create the Spark network
Start the training
Get the model configuration

MEMORY UTILIZATION: SOMETHING TO TAKE CARE
OF
Take Care of the
Off-Heap Memory!

More on DL with DL4J on Spark in my book
http://tinyurl.3c1om/y9jkvtuy

Thanks!
Any questions?
You can find me at
@GuglielmoIozzia
https://ie.linkedin.com/in/giozzia
googlielmo.blogspot.com

Big Things Conference 2019 - Distributed Deep Learning with Keras/TensorFlow on Apache Spark

More Related Content

What's hot (20)

Similar to Big Things Conference 2019 - Distributed Deep Learning with Keras/TensorFlow on Apache Spark (20)

Recently uploaded (20)

Big Things Conference 2019 - Distributed Deep Learning with Keras/TensorFlow on Apache Spark