Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark

Build, Scale, and Deploy Deep
Learning Pipelines with Ease
Using Apache Spark
Tim Hunter (Software Engineer)
Sue Ann Hong (Software Engineer)
Spark Meetup - August 22nd, 2017

About Us
• Sue Ann Hong
• Software engineer @ Databricks
• Ph.D. from CMU in Machine Learning
• Tim Hunter
• Software engineer @ Databricks
• Ph.D. from UC Berkeley in Machine Learning
• Very early Spark user

Today
• Deep Learning at scale made easy: the vision
• Processing images with DL Pipelines
• Building simple Deep Learning models with transfer learning
• Model deployment via SQL
More advanced topics will be covered during the Q&A and other
meetups.

What is Deep Learning?
• A set of machine learning techniques that use layers that
transform numerical inputs
• Classification
• Regression
• Arbitrary mapping
• Popular in the 80’s as Neural Networks
• Recently came back thanks to advances in data collection,
computation techniques, and hardware.

Success of Deep Learning
• Tremendous success for applications with complex data
• AlphaGo
• Image interpretation
• Automatictranslation
• Speech recognition

But still requires a lot of effort
• Low level APIs with steep learning curve
• Tedious to distribute computations
• Not well integrated with other enterprise tools
• No exact science around deep learning
• Success requires many engineer-hours

Deep Learning in industry
• Currently limited adoption
• Huge potential beyond the industrial giants
• How do we accelerate the road to massive availability?

A typical Deep Learning workflow
• Load data (images, text, time series, …)
• Interactive work
• Train
• Select an architecture for a neural network
• Optimize the weights of the NN
• Evaluateresults, potentially re-train
• Apply:
• Pass the data through the NN to produce new features or output

How can Spark help?
• A lot of libraries available for Deep Learning in Spark
• TensorFlowOnSpark, BigDL, …
• Goes from simple to very advanced
• See our previous meetuptalks for more detail
• Spark is great at scaling out computations
• Distribute the transforms
• Manage the trainingcomputation
• Spark MLlib Pipelines
• Simple, concise APIto capture the ML workflow

Deep Learning Pipelines:
Deep Learning with Simplicity
• Open-source Databricks library:
https://github.com/databricks/spark-deep-learning
• Focuses on easeof useand integration,without sacrificing
performance
• Scales out common tasks
• Integrates with Spark APIs
• Primary language: Python

Deep Learning Pipelines
• Load data
• Interactive work
• Train
• Evaluate model
• Apply
• Image loading in Spark
• Deploying models in SQL
• Transfer learning
• Distributed tuning
• Distributed prediction
• Pre-trained models
This
talk:
✓
✓
✓
✓

Image processing with DL
Pipelines and Databricks

Adds support for images in Spark
• ImageSchema, reader, conversion functions to/from numpy
arrays
• Most of the tools we’ll describe work on ImageSchema columns
from sparkdl import readImages
image_df = readImages(sample_img_dir)

Applying popular models
• Popular pre-trained models accessible through MLlib
Transformers
predictor = DeepImagePredictor(inputCol="image",
outputCol="predicted_labels",
modelName="InceptionV3")
predictions_df = predictor.transform(image_df)

Applying popular models
predictor = DeepImagePredictor(inputCol="image",
outputCol="predicted_labels",
modelName="InceptionV3")
predictions_df = predictor.transform(image_df)

Fast model training via
transfer learning

Example: Identify the James Bond cars

SoftMax
GIANT PANDA 0.9
RED PANDA 0.05
RACCOON 0.01
…
Classifier
Transfer Learning
DeepImageFeaturizer

MLlib primer
• MLlib: the machine learning library included with Spark
• Transformer
• Transforms the data: takes a Spark dataframe and appends a new column
• Estimator
• Produces a model (fit)
• Pipeline: sequence of transformers and estimators

Transfer Learning as a Pipeline
MLlib Pipeline
Image
Loading Preprocessing
Logistic
Regression
DeepImageFeaturizer

Sharing and exporting Deep
Learning models

Classifier
Deep Learning Model
Model Export and Sharing

Shipping predictors in SQL
Take a trained model / Pipeline, register a SQL UDF usable by
anyone in the organization
In Spark SQL:
registerKerasUDF(”my_object_recognition_function",
keras_model_file="/mymodels/007model.h5")
select image, my_object_recognition_function(image) as objects
from traffic_imgs

Deep Learning without Deep Pockets
• Simple API for Deep Learning, integrated with MLlib
• Scales common tasks with transformers and estimators
• Embeds Deep Learning models in MLlib and SparkSQL
• Early release of Deep Learning Pipelines
https://github.com/databricks/spark-deep-learning

Deep Learning Pipelines - future
In progress
• Hyper-parameter tuning for Keras models
• Official image support in Spark
• Scala API
(Potential) future work
• Text models
• Support for more backends, e.g. MXNet, PyTorch, BigDL

Resources
Blog posts & webinars — http://databricks.com/blog
• Deep Learning Pipelines
• GPU acceleration in Databricks
• BigDL on Databricks
• Deep Learning and Apache Spark
Docs for Deep Learning on Databricks — http://docs.databricks.com
• Getting started
• Deep Learning Pipelines Example
• Spark integration

https://spark-summit.org/eu-2017/
15% Discount code: Databricks

https://databricks.com/company/careers
GREAT

Thank You!
Questions?
Happy Sparking & Deep Learning!

Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark

Recommended

More Related Content

What's hot (20)

Similar to Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark (20)

More from Databricks (20)

Recently uploaded (20)

Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark