Build, Scale, and Deploy Deep Learning Pipelines with Ease

Build, Scale, and Deploy Deep
Learning Pipelines with Ease
Tim Hunter (Software Engineer)
Sue Ann Hong (Software Engineer)
Jules S. Damji (Spark Community Evangelist)
July 27, 2017

Agenda
• Logistics
• Databricks Overview
• Build, Scale and Deploy Deep Learning Pipelines with Ease
• Q & A

Logistics
• We can’t hear you…
• Recording will be available...
• Slides will be available...
• Queue up Questions ….
• Orange Button for Tech Support difficulties...

TEAM
About Databricks
Started Spark project (now Apache Spark) at UC Berkeleyin 2009
PRODUCT
Unified Analytics Platform
MISSION
Making Big Data Simple

Accelerate innovation by
unifying data science,
engineering and business.
Unified Analytics
Platform
UNIFIED
INFRASTRUCTURE
UNIFIED
EXPERIENCE
ACROSS TEAMS
UNIFIED
ANALYTIC
WORKFLOWS

The Unified Analytics Platform

About Us
• Sue Ann Hong
• Software engineer @ Databricks
• Ph.D. from CMU in Machine Learning
• Contributor to MLlib
• Author of Deep Learning Pipelines

About Us
• Tim Hunter
• Software engineer @ Databricks
• Ph.D. from UC Berkeley in Machine Learning
• Very early Spark user
• Contributor to MLlib
• Author of Deep Learning Pipelines, TensorFrames and
GraphFrames

Build, Scale, and Deploy Deep
Learning Pipelines with Ease
Tim Hunter (Software Engineer)
Sue Ann Hong (Software Engineer)
July 27, 2017

Today
• Deep Learning at scale made easy: the vision
• Processing images with DL Pipelines
• Building simple Deep Learning models with transfer learning
• Model deployment via SQL
Further advanced topics will be covered in our next webinar.

What is Deep Learning?
• A set of machine learning techniques that use layers that
transform numerical inputs
• Classification
• Regression
• Arbitrary mapping
• Popular in the 80’s as Neural Networks
• Recently came back thanks to advances in data collection,
computation techniques, and hardware.

Success of Deep Learning
• Tremendous success for applications with complex data
• AlphaGo
• Image interpretation
• Automatictranslation
• Speech recognition

But still requires a lot of effort
• Low level APIs with steep learning curve
• Tedious to distribute computations
• Not well integrated with other enterprise tools
• No exact science around deep learning
• Success requires many engineer-hours

Deep Learning in industry
• Currently limited adoption
• Huge potential beyond the industrial giants
• How do we accelerate the road to massive availability?

A typical Deep Learning workflow
• Load data (images, text, time series, …)
• Interactive work
• Train
• Select an architecture for a neural network
• Optimize the weights of the NN
• Evaluateresults, potentially re-train
• Apply:
• Pass the data through the NN to produce new features or output

How can Spark help?
• A lot of libraries available for Deep Learning in Spark
• TensorFlowOnSpark, BigDL, …
• Goes from simple to very advanced
• See our previous webinar for more detail
• Spark is great at scaling out computations
• Distribute the transforms
• Manage the trainingcomputation
• Spark MLlib Pipelines
• Simple, concise APIto capture the ML workflow

Deep Learning Pipelines:
Deep Learning with Simplicity
• Open-source Databricks library:
https://github.com/databricks/spark-deep-learning
• Focuses on easeof useand integration,without sacrificing
performance
• Scales out common tasks
• Integrates with Spark APIs
• Primary language: Python

Deep Learning Pipelines
• Load data
• Interactive work
• Train
• Evaluate model
• Apply
• Image loading in Spark
• Deploying models in SQL
• Transfer learning
• Distributed tuning
• Distributed prediction
• Pre-trained models
This
webinar:
✓
✓
✓
✓

Image processing with DL
Pipelines and Databricks

Adds support for images in Spark
• ImageSchema, reader, conversion functions to/from numpy
arrays
• Most of the tools we’ll describe work on ImageSchema columns
from sparkdl import readImages
image_df = readImages(sample_img_dir)

Applying popular models
• Popular pre-trained models accessible through MLlib
Transformers
predictor = DeepImagePredictor(inputCol="image",
outputCol="predicted_labels",
modelName="InceptionV3")
predictions_df = predictor.transform(image_df)

Applying popular models
predictor = DeepImagePredictor(inputCol="image",
outputCol="predicted_labels",
modelName="InceptionV3")
predictions_df = predictor.transform(image_df)

Fast model training via
transfer learning

Example: Identify the James Bond cars

SoftMax
GIANT PANDA 0.9
RED PANDA 0.05
RACCOON 0.01
…
Classifier
Transfer Learning
DeepImageFeaturizer

MLlib primer
• MLlib: the machine learning library included with Spark
• Transformer
• Transforms the data: takes a Spark dataframe and appends a new column
• Estimator
• Produces a model (fit)
• Pipeline: sequence of transformers and estimators

Transfer Learning as a Pipeline
MLlib Pipeline
Image
Loading Preprocessing
Logistic
Regression
DeepImageFeaturizer

Sharing and exporting Deep
Learning models

Classifier
Deep Learning Model
Model Export and Sharing

Shipping predictors in SQL
Take a trained model / Pipeline, register a SQL UDF usable by
anyone in the organization
In Spark SQL:
registerKerasUDF(”my_object_recognition_function",
keras_model_file="/mymodels/007model.h5")
select image, my_object_recognition_function(image) as objects
from traffic_imgs

Deep Learning without Deep Pockets
• Simple API for Deep Learning, integrated with MLlib
• Scales common tasks with transformers and estimators
• Embeds Deep Learning models in MLlib and SparkSQL
• Early release of Deep Learning Pipelines
https://github.com/databricks/spark-deep-learning

Deep Learning Pipelines - future
In progress
• Hyper-parameter tuning for Keras models
• Official image support in Spark
Potential future work
• Scala API
• Text models
• Support for more backends, e.g. MXNet, PyTorch, BigDL

Resources
Blog posts & webinars — http://databricks.com/blog
• Deep Learning Pipelines
• GPU acceleration in Databricks
• BigDL on Databricks
• Deep Learning and Apache Spark
Docs for Deep Learning on Databricks — http://docs.databricks.com
• Getting started
• Deep Learning Pipelines Example
• Spark integration

Thank You!
Questions?
Happy Sparking & Deep Learning!

UNIFIED ANALYTICS PLATFORM
Try Apache Spark in Databricks!
• Collaborative cloud environment
• Free version (community edition)
DATABRICKS RUNTIME 3.0
• Apache Spark - optimized for the cloud
• Caching and optimization layer - DBIO
• Enterprise security - DBES
Try for free today
databricks.com

Build, Scale, and Deploy Deep Learning Pipelines with Ease

More Related Content

What's hot (20)

Similar to Build, Scale, and Deploy Deep Learning Pipelines with Ease (20)

More from Databricks (20)

Recently uploaded (20)

Build, Scale, and Deploy Deep Learning Pipelines with Ease