Build, Scale, and Deploy Deep
Learning Pipelines with Ease
Using Apache Spark
Tim Hunter (Software Engineer)
Sue Ann Hong (Software Engineer)
Spark Meetup - August 22nd, 2017
About Us
• Sue Ann Hong
• Software engineer @ Databricks
• Ph.D. from CMU in Machine Learning
• Tim Hunter
• Software engineer @ Databricks
• Ph.D. from UC Berkeley in Machine Learning
• Very early Spark user
Today
• Deep Learning at scale made easy: the vision
• Processing images with DL Pipelines
• Building simple Deep Learning models with transfer learning
• Model deployment via SQL
More advanced topics will be covered during the Q&A and other
meetups.
Deep Learning with ease
What is Deep Learning?
• A set of machine learning techniques that use layers that
transform numerical inputs
• Classification
• Regression
• Arbitrary mapping
• Popular in the 80’s as Neural Networks
• Recently came back thanks to advances in data collection,
computation techniques, and hardware.
Success of Deep Learning
• Tremendous success for applications with complex data
• AlphaGo
• Image interpretation
• Automatictranslation
• Speech recognition
But still requires a lot of effort
• Low level APIs with steep learning curve
• Tedious to distribute computations
• Not well integrated with other enterprise tools
• No exact science around deep learning
• Success requires many engineer-hours
Deep Learning in industry
• Currently limited adoption
• Huge potential beyond the industrial giants
• How do we accelerate the road to massive availability?
A typical Deep Learning workflow
• Load data (images, text, time series, …)
• Interactive work
• Train
• Select an architecture for a neural network
• Optimize the weights of the NN
• Evaluateresults, potentially re-train
• Apply:
• Pass the data through the NN to produce new features or output
How can Spark help?
• A lot of libraries available for Deep Learning in Spark
• TensorFlowOnSpark, BigDL, …
• Goes from simple to very advanced
• See our previous meetuptalks for more detail
• Spark is great at scaling out computations
• Distribute the transforms
• Manage the trainingcomputation
• Spark MLlib Pipelines
• Simple, concise APIto capture the ML workflow
Deep Learning Pipelines:
Deep Learning with Simplicity
• Open-source Databricks library:
https://github.com/databricks/spark-deep-learning
• Focuses on easeof useand integration,without sacrificing
performance
• Scales out common tasks
• Integrates with Spark APIs
• Primary language: Python
Deep Learning Pipelines
• Load data
• Interactive work
• Train
• Evaluate model
• Apply
• Image	loading	in	Spark
• Deploying	models	in	SQL
• Transfer	learning
• Distributed	tuning
• Distributed	prediction
• Pre-trained	models
This
talk:
✓
✓
✓
✓
Image processing with DL
Pipelines and Databricks
Adds support for images in Spark
• ImageSchema, reader, conversion functions to/from numpy
arrays
• Most of the tools we’ll describe work on ImageSchema columns
from sparkdl import readImages
image_df = readImages(sample_img_dir)
Applying popular models
• Popular pre-trained models accessible through MLlib
Transformers
predictor = DeepImagePredictor(inputCol="image",
outputCol="predicted_labels",
modelName="InceptionV3")
predictions_df = predictor.transform(image_df)
Applying popular models
predictor = DeepImagePredictor(inputCol="image",
outputCol="predicted_labels",
modelName="InceptionV3")
predictions_df = predictor.transform(image_df)
Fast model training via
transfer learning
Example: Identify the James Bond cars
DEMO
Transfer Learning
Transfer Learning
Transfer Learning
Transfer Learning
Transfer Learning
SoftMax
GIANT PANDA 0.9
RED PANDA 0.05
RACCOON 0.01
…
Classifier
Transfer Learning
DeepImageFeaturizer
MLlib primer
• MLlib: the machine learning library included with Spark
• Transformer
• Transforms the data: takes a Spark dataframe and appends a new column
• Estimator
• Produces a model (fit)
• Pipeline: sequence of transformers and estimators
Transfer Learning as a Pipeline
MLlib Pipeline
Image
Loading Preprocessing
Logistic
Regression
DeepImageFeaturizer
DEMO
Sharing and exporting Deep
Learning models
Classifier
Deep	Learning	Model
Model Export and Sharing
Shipping predictors in SQL
Take a trained model / Pipeline, register a SQL UDF usable by
anyone in the organization
In Spark SQL:
registerKerasUDF(”my_object_recognition_function",
keras_model_file="/mymodels/007model.h5")
select image, my_object_recognition_function(image) as objects
from traffic_imgs
DEMO
Conclusion
Deep Learning without Deep Pockets
• Simple API for Deep Learning, integrated with MLlib
• Scales common tasks with transformers and estimators
• Embeds Deep Learning models in MLlib and SparkSQL
• Early release of Deep Learning Pipelines
https://github.com/databricks/spark-deep-learning
Deep Learning Pipelines - future
In progress
• Hyper-parameter tuning for Keras models
• Official image support in Spark
• Scala API
(Potential) future work
• Text models
• Support for more backends, e.g. MXNet, PyTorch, BigDL
Resources
Blog posts & webinars — http://databricks.com/blog
• Deep Learning Pipelines
• GPU acceleration in Databricks
• BigDL on Databricks
• Deep Learning and Apache Spark
Docs for Deep Learning on Databricks — http://docs.databricks.com
• Getting started
• Deep Learning Pipelines Example
• Spark integration
https://spark-summit.org/eu-2017/
15% Discount code: Databricks
https://databricks.com/company/careers
GREAT
Thank You!
Questions?
Happy Sparking & Deep Learning!

More Related Content

PDF
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
PDF
Deep Learning on Apache® Spark™ : Workflows and Best Practices
PDF
Build, Scale, and Deploy Deep Learning Pipelines with Ease
PDF
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
PDF
Jump Start with Apache Spark 2.0 on Databricks
PDF
What's New in Apache Spark 2.3 & Why Should You Care
PDF
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
PDF
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Build, Scale, and Deploy Deep Learning Pipelines with Ease
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Jump Start with Apache Spark 2.0 on Databricks
What's New in Apache Spark 2.3 & Why Should You Care
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...

What's hot (20)

PDF
Composable Parallel Processing in Apache Spark and Weld
PDF
Integrating Deep Learning Libraries with Apache Spark
PDF
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
PDF
A Tale of Three Tools: Kubernetes, Jsonnet, and Bazel
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
PPTX
Spark r under the hood with Hossein Falaki
PDF
Operational Tips For Deploying Apache Spark
PPTX
Large-Scale Data Science in Apache Spark 2.0
PDF
Apache Spark Usage in the Open Source Ecosystem
PDF
Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
PDF
Spark Summit 2016: Connecting Python to the Spark Ecosystem
PDF
Spark Summit EU talk by Tim Hunter
PDF
What's New in Upcoming Apache Spark 2.3
PDF
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
PPTX
From Pipelines to Refineries: scaling big data applications with Tim Hunter
PPTX
Simplifying Big Data Applications with Apache Spark 2.0
PDF
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
PDF
Resource-Efficient Deep Learning Model Selection on Apache Spark
PDF
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
PDF
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Composable Parallel Processing in Apache Spark and Weld
Integrating Deep Learning Libraries with Apache Spark
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
A Tale of Three Tools: Kubernetes, Jsonnet, and Bazel
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark r under the hood with Hossein Falaki
Operational Tips For Deploying Apache Spark
Large-Scale Data Science in Apache Spark 2.0
Apache Spark Usage in the Open Source Ecosystem
Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit EU talk by Tim Hunter
What's New in Upcoming Apache Spark 2.3
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
From Pipelines to Refineries: scaling big data applications with Tim Hunter
Simplifying Big Data Applications with Apache Spark 2.0
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Resource-Efficient Deep Learning Model Selection on Apache Spark
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Ad

Similar to Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark (20)

PPTX
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
PPTX
Emiliano Martinez | Deep learning in Spark Slides | Codemotion Madrid 2018
PDF
Deep Learning on Apache® Spark™: Workflows and Best Practices
PDF
Deep Learning on Apache® Spark™: Workflows and Best Practices
PDF
Build a deep learning pipeline on apache spark for ads optimization
PPTX
Combining Machine Learning frameworks with Apache Spark
PDF
Deep learning and Apache Spark
PPTX
Combining Machine Learning Frameworks with Apache Spark
PDF
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
PDF
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
PDF
Bringing Deep Learning into production
PDF
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
PPTX
No BS Guide to Deep Learning in the Enterprise
PDF
Index conf sparkai-feb20-n-pentreath
PPTX
BigDL Deep Learning in Apache Spark - AWS re:invent 2017
PPTX
Tuning and Monitoring Deep Learning on Apache Spark
PDF
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
PDF
Guglielmo iozzia - Google I/O extended dublin 2018
PDF
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
PPTX
Machine learning at scale - Webinar By zekeLabs
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Emiliano Martinez | Deep learning in Spark Slides | Codemotion Madrid 2018
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Build a deep learning pipeline on apache spark for ads optimization
Combining Machine Learning frameworks with Apache Spark
Deep learning and Apache Spark
Combining Machine Learning Frameworks with Apache Spark
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
Bringing Deep Learning into production
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
No BS Guide to Deep Learning in the Enterprise
Index conf sparkai-feb20-n-pentreath
BigDL Deep Learning in Apache Spark - AWS re:invent 2017
Tuning and Monitoring Deep Learning on Apache Spark
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Guglielmo iozzia - Google I/O extended dublin 2018
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
Machine learning at scale - Webinar By zekeLabs
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake

Recently uploaded (20)

PPTX
Computer Software - Technology and Livelihood Education
PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PDF
MCP Security Tutorial - Beginner to Advanced
PDF
E-Commerce Website Development Companyin india
PPTX
CNN LeNet5 Architecture: Neural Networks
PDF
iTop VPN Crack Latest Version Full Key 2025
PDF
CCleaner 6.39.11548 Crack 2025 License Key
PDF
BoxLang Dynamic AWS Lambda - Japan Edition
PPTX
Matchmaking for JVMs: How to Pick the Perfect GC Partner
PPTX
Tech Workshop Escape Room Tech Workshop
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PPTX
most interesting chapter in the world ppt
PDF
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
PPTX
Trending Python Topics for Data Visualization in 2025
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PPTX
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
PPTX
Download Adobe Photoshop Crack 2025 Free
PPTX
GSA Content Generator Crack (2025 Latest)
Computer Software - Technology and Livelihood Education
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
MCP Security Tutorial - Beginner to Advanced
E-Commerce Website Development Companyin india
CNN LeNet5 Architecture: Neural Networks
iTop VPN Crack Latest Version Full Key 2025
CCleaner 6.39.11548 Crack 2025 License Key
BoxLang Dynamic AWS Lambda - Japan Edition
Matchmaking for JVMs: How to Pick the Perfect GC Partner
Tech Workshop Escape Room Tech Workshop
Topaz Photo AI Crack New Download (Latest 2025)
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
most interesting chapter in the world ppt
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
Trending Python Topics for Data Visualization in 2025
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
Download Adobe Photoshop Crack 2025 Free
GSA Content Generator Crack (2025 Latest)

Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark

  • 1. Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark Tim Hunter (Software Engineer) Sue Ann Hong (Software Engineer) Spark Meetup - August 22nd, 2017
  • 2. About Us • Sue Ann Hong • Software engineer @ Databricks • Ph.D. from CMU in Machine Learning • Tim Hunter • Software engineer @ Databricks • Ph.D. from UC Berkeley in Machine Learning • Very early Spark user
  • 3. Today • Deep Learning at scale made easy: the vision • Processing images with DL Pipelines • Building simple Deep Learning models with transfer learning • Model deployment via SQL More advanced topics will be covered during the Q&A and other meetups.
  • 5. What is Deep Learning? • A set of machine learning techniques that use layers that transform numerical inputs • Classification • Regression • Arbitrary mapping • Popular in the 80’s as Neural Networks • Recently came back thanks to advances in data collection, computation techniques, and hardware.
  • 6. Success of Deep Learning • Tremendous success for applications with complex data • AlphaGo • Image interpretation • Automatictranslation • Speech recognition
  • 7. But still requires a lot of effort • Low level APIs with steep learning curve • Tedious to distribute computations • Not well integrated with other enterprise tools • No exact science around deep learning • Success requires many engineer-hours
  • 8. Deep Learning in industry • Currently limited adoption • Huge potential beyond the industrial giants • How do we accelerate the road to massive availability?
  • 9. A typical Deep Learning workflow • Load data (images, text, time series, …) • Interactive work • Train • Select an architecture for a neural network • Optimize the weights of the NN • Evaluateresults, potentially re-train • Apply: • Pass the data through the NN to produce new features or output
  • 10. How can Spark help? • A lot of libraries available for Deep Learning in Spark • TensorFlowOnSpark, BigDL, … • Goes from simple to very advanced • See our previous meetuptalks for more detail • Spark is great at scaling out computations • Distribute the transforms • Manage the trainingcomputation • Spark MLlib Pipelines • Simple, concise APIto capture the ML workflow
  • 11. Deep Learning Pipelines: Deep Learning with Simplicity • Open-source Databricks library: https://github.com/databricks/spark-deep-learning • Focuses on easeof useand integration,without sacrificing performance • Scales out common tasks • Integrates with Spark APIs • Primary language: Python
  • 12. Deep Learning Pipelines • Load data • Interactive work • Train • Evaluate model • Apply • Image loading in Spark • Deploying models in SQL • Transfer learning • Distributed tuning • Distributed prediction • Pre-trained models This talk: ✓ ✓ ✓ ✓
  • 13. Image processing with DL Pipelines and Databricks
  • 14. Adds support for images in Spark • ImageSchema, reader, conversion functions to/from numpy arrays • Most of the tools we’ll describe work on ImageSchema columns from sparkdl import readImages image_df = readImages(sample_img_dir)
  • 15. Applying popular models • Popular pre-trained models accessible through MLlib Transformers predictor = DeepImagePredictor(inputCol="image", outputCol="predicted_labels", modelName="InceptionV3") predictions_df = predictor.transform(image_df)
  • 16. Applying popular models predictor = DeepImagePredictor(inputCol="image", outputCol="predicted_labels", modelName="InceptionV3") predictions_df = predictor.transform(image_df)
  • 17. Fast model training via transfer learning
  • 18. Example: Identify the James Bond cars
  • 19. DEMO
  • 25. SoftMax GIANT PANDA 0.9 RED PANDA 0.05 RACCOON 0.01 … Classifier Transfer Learning DeepImageFeaturizer
  • 26. MLlib primer • MLlib: the machine learning library included with Spark • Transformer • Transforms the data: takes a Spark dataframe and appends a new column • Estimator • Produces a model (fit) • Pipeline: sequence of transformers and estimators
  • 27. Transfer Learning as a Pipeline MLlib Pipeline Image Loading Preprocessing Logistic Regression DeepImageFeaturizer
  • 28. DEMO
  • 29. Sharing and exporting Deep Learning models
  • 31. Shipping predictors in SQL Take a trained model / Pipeline, register a SQL UDF usable by anyone in the organization In Spark SQL: registerKerasUDF(”my_object_recognition_function", keras_model_file="/mymodels/007model.h5") select image, my_object_recognition_function(image) as objects from traffic_imgs
  • 32. DEMO
  • 34. Deep Learning without Deep Pockets • Simple API for Deep Learning, integrated with MLlib • Scales common tasks with transformers and estimators • Embeds Deep Learning models in MLlib and SparkSQL • Early release of Deep Learning Pipelines https://github.com/databricks/spark-deep-learning
  • 35. Deep Learning Pipelines - future In progress • Hyper-parameter tuning for Keras models • Official image support in Spark • Scala API (Potential) future work • Text models • Support for more backends, e.g. MXNet, PyTorch, BigDL
  • 36. Resources Blog posts & webinars — http://databricks.com/blog • Deep Learning Pipelines • GPU acceleration in Databricks • BigDL on Databricks • Deep Learning and Apache Spark Docs for Deep Learning on Databricks — http://docs.databricks.com • Getting started • Deep Learning Pipelines Example • Spark integration