SlideShare a Scribd company logo
Distributed Deep
Learning with Docker
at Salesforce
Jeff Hajewski
Software Engineer,
Salesforce.
github.com/j-haj
jeff-hajewski-3a1b5a29
Caveats
● These my own views and opinions, not those of Salesforce
● This is how one team at Salesforce deploys deep learning
models
● When I use the term Docker I am referring to the
technology, not the company
● Some of these designs are simplified
● What is deep learning and why is it difficult?
● Deep learning at Salesforce
● Challenges
○ Designing for team specialization
○ Interacting with the model server
○ Testing
● Key takeaways
About this talk
The core task of deep learning is function approximation.
Neural networks can approximate any function.
Neural networks are expensive to evaluate.
● Linear regression: ~1,000 parameters
● Deep neural network: 100M - 1B parameters (100,000 - 1M x linear reg.)
Deep Learning Review
How should we design
distributed systems
for deep learning?
high latency tasks
We use deep learning models to provide our customers
useful information about their sales process.
They send us this data as a firehose of streaming data.
The faster we get this data to our customers, the more
useful and actionable it is for their sales teams.
Deep Learning at Salesforce
There are three steps to this process
1. Preprocessing - cleaning and formatting the data
2. Inference - running the data through the model
3. Postprocessing - interpreting the output from the model
Deep Learning at Salesforce
Deep Learning at Salesforce
Discusses
cat
preprocess
[0.2, 0.71, 0.89, 0.6]
[0.85, 0.15]
inference
postprocess“Hello! My
cat is
friendly.”
Deep Learning at Salesforce
Discusses
cat
preprocess
[0.2, 0.71, 0.89, 0.6]
[0.85, 0.15]
inference
postprocess“Hello! My
cat is
friendly.”
Data Science Team
Deep Learning at Salesforce
Discusses
cat
preprocess
[0.2, 0.71, 0.89, 0.6]
[0.85, 0.15]
inference
postprocess“Hello! My
cat is
friendly.”
Data Science Team Systems Team
Challenge 1:
designing
for team
specialization
Requirements
1. The data science team shouldn’t need to know
about the system. They just want to define a
sequence of computation.
2. The systems engineers shouldn’t need to know
anything about the computation. They just want to
scale the system.
Designing for team specialization
Challenges
1. Some functions takes longer to execute than others
(e.g., model inference)
2. The order of execution is important
Designing for team specialization
Solution: map functions to containers
Designing for team specialization
postprocess(inference(preprocess(x)))
preprocess inference postprocess
What about throughput?
preprocess inference postprocess
0110010
1001111
0110010
1000011
It’s a cat!
1,000
QPS
500
QPS
300
QPS
1,000
QPS
Max
throughput
inference
inferenceinferencepreprocess
What about throughput?
preprocess inference postprocess
0110010
1001111
0110010
1000011
It’s a cat!
1,000
QPS
2x
500
QPS
4x
300
QPS
1,000
QPS
Max
throughput
Docker enables us to easily scale out each individual stage
inferenceinferencepreprocess
What about throughput?
preprocess inference postprocess
0110010
1001111
0110010
1000011
It’s a cat!
1,000
QPS
2x
500
QPS
2x
300
QPS
1,000
QPS
Max
throughput
Kafka gives stage-wise checkpointing
Challenge 2:
interacting
with the
model servers
Model servers provide a way to query the model,
typically via gRPC or HTTP.
What is the best way to deploy and interact with these
model servers?
Serving deep learning models
Challenge:
1. Model servers are designed as a standalone process.
2. How should we best utilize multiple GPUs?
3. What about networking?
Interacting with the model server
We want to keep deployment simple!
Solution: Deploy model server images as part of a “pod” or
“group” with a coordinator service
Interacting with the model server
JVM
Manager
Model Server Model Server Model Server...
Pod
1. Who owns the model server?
2. How should we handle model versions? Where are they
stored locally?
3. What are the addresses of the model servers?
This solves additional challenges
Data science team
Docker shared volume
http://localhost via Docker private networking
Challenge 3:
testing
Challenge: how should we test these systems?
1. Deep learning models are probabilistic
2. Interservice interactions can be quite complex
Testing
Solution: Docker Compose
● Makes it easy to swap out the model server with a mock
service
● Deploying the entire system locally is easy
● Integrates well with Maven and Gradle
Testing
We haven’t spent a lot of time
discussing the details of Docker
That is precisely the point!
● Docker allows us to simplify many aspects of our design.
● Docker stays out of the way.
● Docker provides a simple alternative to a much more
complex solution.
Docker simplifies our lives
Ad

Recommended

Monolithic to Microservices + Docker = SDLC on Steroids!
Monolithic to Microservices + Docker = SDLC on Steroids!
Docker, Inc.
 
DCSF19 Containerized Databases for Enterprise Applications
DCSF19 Containerized Databases for Enterprise Applications
Docker, Inc.
 
DCSF19 Transforming a 15+ Year Old Semiconductor Manufacturing Environment
DCSF19 Transforming a 15+ Year Old Semiconductor Manufacturing Environment
Docker, Inc.
 
Infrastructure as Code with Ansible
Infrastructure as Code with Ansible
Daniel Bezerra
 
Monitoring in a Microservices World
Monitoring in a Microservices World
Docker, Inc.
 
Puppet overview
Puppet overview
joshbeard
 
Database deployments - dotnetsheff
Database deployments - dotnetsheff
Giulio Vian
 
DockerCon 16 General Session Day 1
DockerCon 16 General Session Day 1
Docker, Inc.
 
Predicting Space Weather with Docker
Predicting Space Weather with Docker
Docker, Inc.
 
DCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at Netflix
Docker, Inc.
 
Hands-on Helm
Hands-on Helm
Docker, Inc.
 
Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...
Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...
Docker, Inc.
 
Puppet plugin for vRealize Automation (vRA)
Puppet plugin for vRealize Automation (vRA)
Puppet
 
Simple tweaks to get the most out of your JVM
Simple tweaks to get the most out of your JVM
Jamie Coleman
 
Webinar: Accelerate Your Inner Dev Loop for Kubernetes Services
Webinar: Accelerate Your Inner Dev Loop for Kubernetes Services
Ambassador Labs
 
DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...
DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...
Docker, Inc.
 
DCSF 19 Microservices API: Routing Across Any Infrastructure
DCSF 19 Microservices API: Routing Across Any Infrastructure
Docker, Inc.
 
DCSF 19 Developing Apps with Containers, Functions and Cloud Services
DCSF 19 Developing Apps with Containers, Functions and Cloud Services
Docker, Inc.
 
Serverless java
Serverless java
Vishwas N
 
DCSF 19 Modernizing Insurance with Docker Enterprise: The Physicians Mutual ...
DCSF 19 Modernizing Insurance with Docker Enterprise: The Physicians Mutual ...
Docker, Inc.
 
DCEU 18: From Monolith to Microservices
DCEU 18: From Monolith to Microservices
Docker, Inc.
 
Accessible hpc for everyone with docker and containers
Accessible hpc for everyone with docker and containers
Docker, Inc.
 
DCEU 18: 5 Patterns for Success in Application Transformation
DCEU 18: 5 Patterns for Success in Application Transformation
Docker, Inc.
 
Networking in Docker EE 2.0 with Kubernetes and Swarm
Networking in Docker EE 2.0 with Kubernetes and Swarm
Abhinandan P.b
 
Application Deployment and Management at Scale with 1&1 by Matt Baldwin
Application Deployment and Management at Scale with 1&1 by Matt Baldwin
Docker, Inc.
 
JEEconf 2017
JEEconf 2017
Ihor Kolodyuk
 
Container on azure
Container on azure
Vishwas N
 
Immutable Awesomeness by John Willis and Josh Corman
Immutable Awesomeness by John Willis and Josh Corman
Docker, Inc.
 
No BS Guide to Deep Learning in the Enterprise
No BS Guide to Deep Learning in the Enterprise
Jesus Rodriguez
 
Why scala for data science
Why scala for data science
Guglielmo Iozzia
 

More Related Content

What's hot (20)

Predicting Space Weather with Docker
Predicting Space Weather with Docker
Docker, Inc.
 
DCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at Netflix
Docker, Inc.
 
Hands-on Helm
Hands-on Helm
Docker, Inc.
 
Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...
Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...
Docker, Inc.
 
Puppet plugin for vRealize Automation (vRA)
Puppet plugin for vRealize Automation (vRA)
Puppet
 
Simple tweaks to get the most out of your JVM
Simple tweaks to get the most out of your JVM
Jamie Coleman
 
Webinar: Accelerate Your Inner Dev Loop for Kubernetes Services
Webinar: Accelerate Your Inner Dev Loop for Kubernetes Services
Ambassador Labs
 
DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...
DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...
Docker, Inc.
 
DCSF 19 Microservices API: Routing Across Any Infrastructure
DCSF 19 Microservices API: Routing Across Any Infrastructure
Docker, Inc.
 
DCSF 19 Developing Apps with Containers, Functions and Cloud Services
DCSF 19 Developing Apps with Containers, Functions and Cloud Services
Docker, Inc.
 
Serverless java
Serverless java
Vishwas N
 
DCSF 19 Modernizing Insurance with Docker Enterprise: The Physicians Mutual ...
DCSF 19 Modernizing Insurance with Docker Enterprise: The Physicians Mutual ...
Docker, Inc.
 
DCEU 18: From Monolith to Microservices
DCEU 18: From Monolith to Microservices
Docker, Inc.
 
Accessible hpc for everyone with docker and containers
Accessible hpc for everyone with docker and containers
Docker, Inc.
 
DCEU 18: 5 Patterns for Success in Application Transformation
DCEU 18: 5 Patterns for Success in Application Transformation
Docker, Inc.
 
Networking in Docker EE 2.0 with Kubernetes and Swarm
Networking in Docker EE 2.0 with Kubernetes and Swarm
Abhinandan P.b
 
Application Deployment and Management at Scale with 1&1 by Matt Baldwin
Application Deployment and Management at Scale with 1&1 by Matt Baldwin
Docker, Inc.
 
JEEconf 2017
JEEconf 2017
Ihor Kolodyuk
 
Container on azure
Container on azure
Vishwas N
 
Immutable Awesomeness by John Willis and Josh Corman
Immutable Awesomeness by John Willis and Josh Corman
Docker, Inc.
 
Predicting Space Weather with Docker
Predicting Space Weather with Docker
Docker, Inc.
 
DCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at Netflix
Docker, Inc.
 
Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...
Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...
Docker, Inc.
 
Puppet plugin for vRealize Automation (vRA)
Puppet plugin for vRealize Automation (vRA)
Puppet
 
Simple tweaks to get the most out of your JVM
Simple tweaks to get the most out of your JVM
Jamie Coleman
 
Webinar: Accelerate Your Inner Dev Loop for Kubernetes Services
Webinar: Accelerate Your Inner Dev Loop for Kubernetes Services
Ambassador Labs
 
DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...
DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...
Docker, Inc.
 
DCSF 19 Microservices API: Routing Across Any Infrastructure
DCSF 19 Microservices API: Routing Across Any Infrastructure
Docker, Inc.
 
DCSF 19 Developing Apps with Containers, Functions and Cloud Services
DCSF 19 Developing Apps with Containers, Functions and Cloud Services
Docker, Inc.
 
Serverless java
Serverless java
Vishwas N
 
DCSF 19 Modernizing Insurance with Docker Enterprise: The Physicians Mutual ...
DCSF 19 Modernizing Insurance with Docker Enterprise: The Physicians Mutual ...
Docker, Inc.
 
DCEU 18: From Monolith to Microservices
DCEU 18: From Monolith to Microservices
Docker, Inc.
 
Accessible hpc for everyone with docker and containers
Accessible hpc for everyone with docker and containers
Docker, Inc.
 
DCEU 18: 5 Patterns for Success in Application Transformation
DCEU 18: 5 Patterns for Success in Application Transformation
Docker, Inc.
 
Networking in Docker EE 2.0 with Kubernetes and Swarm
Networking in Docker EE 2.0 with Kubernetes and Swarm
Abhinandan P.b
 
Application Deployment and Management at Scale with 1&1 by Matt Baldwin
Application Deployment and Management at Scale with 1&1 by Matt Baldwin
Docker, Inc.
 
Container on azure
Container on azure
Vishwas N
 
Immutable Awesomeness by John Willis and Josh Corman
Immutable Awesomeness by John Willis and Josh Corman
Docker, Inc.
 

Similar to Distributed Deep Learning with Docker at Salesforce (20)

No BS Guide to Deep Learning in the Enterprise
No BS Guide to Deep Learning in the Enterprise
Jesus Rodriguez
 
Why scala for data science
Why scala for data science
Guglielmo Iozzia
 
Machine learning in the wild deployment
Machine learning in the wild deployment
Birger Moell
 
Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020
Varun Manik
 
Data Con LA 2018 - Towards Data Science Engineering Principles by Joerg Schad
Data Con LA 2018 - Towards Data Science Engineering Principles by Joerg Schad
Data Con LA
 
Notes on Deploying Machine-learning Models at Scale
Notes on Deploying Machine-learning Models at Scale
Deep Kayal
 
Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017
Clarisse Hedglin
 
Deploying deep learning models with Docker and Kubernetes
Deploying deep learning models with Docker and Kubernetes
PetteriTeikariPhD
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
Seldon
 
End-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in Finance
Jim Dowling
 
BigDL Deep Learning in Apache Spark - AWS re:invent 2017
BigDL Deep Learning in Apache Spark - AWS re:invent 2017
Dave Nielsen
 
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
Provectus
 
DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-us
eltonrodriguez11
 
Scalable and Distributed DNN Training on Modern HPC Systems
Scalable and Distributed DNN Training on Modern HPC Systems
inside-BigData.com
 
IBM Developer Model Asset eXchange
IBM Developer Model Asset eXchange
Nick Pentreath
 
Scaling Deep Learning Algorithms on Extreme Scale Architectures
Scaling Deep Learning Algorithms on Extreme Scale Architectures
inside-BigData.com
 
Deep learning with kafka
Deep learning with kafka
Nitin Kumar
 
Kubeflow.pptx
Kubeflow.pptx
dhaferbenali1
 
Data ops: Machine Learning in production
Data ops: Machine Learning in production
Stepan Pushkarev
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Tyrone Systems
 
No BS Guide to Deep Learning in the Enterprise
No BS Guide to Deep Learning in the Enterprise
Jesus Rodriguez
 
Why scala for data science
Why scala for data science
Guglielmo Iozzia
 
Machine learning in the wild deployment
Machine learning in the wild deployment
Birger Moell
 
Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020
Varun Manik
 
Data Con LA 2018 - Towards Data Science Engineering Principles by Joerg Schad
Data Con LA 2018 - Towards Data Science Engineering Principles by Joerg Schad
Data Con LA
 
Notes on Deploying Machine-learning Models at Scale
Notes on Deploying Machine-learning Models at Scale
Deep Kayal
 
Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017
Clarisse Hedglin
 
Deploying deep learning models with Docker and Kubernetes
Deploying deep learning models with Docker and Kubernetes
PetteriTeikariPhD
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
Seldon
 
End-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in Finance
Jim Dowling
 
BigDL Deep Learning in Apache Spark - AWS re:invent 2017
BigDL Deep Learning in Apache Spark - AWS re:invent 2017
Dave Nielsen
 
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
Provectus
 
DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-us
eltonrodriguez11
 
Scalable and Distributed DNN Training on Modern HPC Systems
Scalable and Distributed DNN Training on Modern HPC Systems
inside-BigData.com
 
IBM Developer Model Asset eXchange
IBM Developer Model Asset eXchange
Nick Pentreath
 
Scaling Deep Learning Algorithms on Extreme Scale Architectures
Scaling Deep Learning Algorithms on Extreme Scale Architectures
inside-BigData.com
 
Deep learning with kafka
Deep learning with kafka
Nitin Kumar
 
Data ops: Machine Learning in production
Data ops: Machine Learning in production
Stepan Pushkarev
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Tyrone Systems
 
Ad

More from Docker, Inc. (20)

Containerize Your Game Server for the Best Multiplayer Experience
Containerize Your Game Server for the Best Multiplayer Experience
Docker, Inc.
 
How to Improve Your Image Builds Using Advance Docker Build
How to Improve Your Image Builds Using Advance Docker Build
Docker, Inc.
 
Build & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWS
Docker, Inc.
 
Securing Your Containerized Applications with NGINX
Securing Your Containerized Applications with NGINX
Docker, Inc.
 
How To Build and Run Node Apps with Docker and Compose
How To Build and Run Node Apps with Docker and Compose
Docker, Inc.
 
The First 10M Pulls: Building The Official Curl Image for Docker Hub
The First 10M Pulls: Building The Official Curl Image for Docker Hub
Docker, Inc.
 
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
Docker, Inc.
 
Become a Docker Power User With Microsoft Visual Studio Code
Become a Docker Power User With Microsoft Visual Studio Code
Docker, Inc.
 
How to Use Mirroring and Caching to Optimize your Container Registry
How to Use Mirroring and Caching to Optimize your Container Registry
Docker, Inc.
 
Kubernetes at Datadog Scale
Kubernetes at Datadog Scale
Docker, Inc.
 
Labels, Labels, Labels
Labels, Labels, Labels
Docker, Inc.
 
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Docker, Inc.
 
Build & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWS
Docker, Inc.
 
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
Docker, Inc.
 
Developing with Docker for the Arm Architecture
Developing with Docker for the Arm Architecture
Docker, Inc.
 
Sharing is Caring: How to Begin Speaking at Conferences
Sharing is Caring: How to Begin Speaking at Conferences
Docker, Inc.
 
Virtual Meetup Docker + Arm: Building Multi-arch Apps with Buildx
Virtual Meetup Docker + Arm: Building Multi-arch Apps with Buildx
Docker, Inc.
 
DCSF 19 eBPF Superpowers
DCSF 19 eBPF Superpowers
Docker, Inc.
 
DCSF 19 Zero Trust Networks Come to Enterprise Kubernetes
DCSF 19 Zero Trust Networks Come to Enterprise Kubernetes
Docker, Inc.
 
DCSF 19 Node.js Rocks in Docker for Dev and Ops
DCSF 19 Node.js Rocks in Docker for Dev and Ops
Docker, Inc.
 
Containerize Your Game Server for the Best Multiplayer Experience
Containerize Your Game Server for the Best Multiplayer Experience
Docker, Inc.
 
How to Improve Your Image Builds Using Advance Docker Build
How to Improve Your Image Builds Using Advance Docker Build
Docker, Inc.
 
Build & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWS
Docker, Inc.
 
Securing Your Containerized Applications with NGINX
Securing Your Containerized Applications with NGINX
Docker, Inc.
 
How To Build and Run Node Apps with Docker and Compose
How To Build and Run Node Apps with Docker and Compose
Docker, Inc.
 
The First 10M Pulls: Building The Official Curl Image for Docker Hub
The First 10M Pulls: Building The Official Curl Image for Docker Hub
Docker, Inc.
 
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
Docker, Inc.
 
Become a Docker Power User With Microsoft Visual Studio Code
Become a Docker Power User With Microsoft Visual Studio Code
Docker, Inc.
 
How to Use Mirroring and Caching to Optimize your Container Registry
How to Use Mirroring and Caching to Optimize your Container Registry
Docker, Inc.
 
Kubernetes at Datadog Scale
Kubernetes at Datadog Scale
Docker, Inc.
 
Labels, Labels, Labels
Labels, Labels, Labels
Docker, Inc.
 
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Docker, Inc.
 
Build & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWS
Docker, Inc.
 
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
Docker, Inc.
 
Developing with Docker for the Arm Architecture
Developing with Docker for the Arm Architecture
Docker, Inc.
 
Sharing is Caring: How to Begin Speaking at Conferences
Sharing is Caring: How to Begin Speaking at Conferences
Docker, Inc.
 
Virtual Meetup Docker + Arm: Building Multi-arch Apps with Buildx
Virtual Meetup Docker + Arm: Building Multi-arch Apps with Buildx
Docker, Inc.
 
DCSF 19 eBPF Superpowers
DCSF 19 eBPF Superpowers
Docker, Inc.
 
DCSF 19 Zero Trust Networks Come to Enterprise Kubernetes
DCSF 19 Zero Trust Networks Come to Enterprise Kubernetes
Docker, Inc.
 
DCSF 19 Node.js Rocks in Docker for Dev and Ops
DCSF 19 Node.js Rocks in Docker for Dev and Ops
Docker, Inc.
 
Ad

Recently uploaded (20)

Murdledescargadarkweb.pdfvolumen1 100 elementary
Murdledescargadarkweb.pdfvolumen1 100 elementary
JorgeSemperteguiMont
 
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Alliance
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
Supporting the NextGen 911 Digital Transformation with FME
Supporting the NextGen 911 Digital Transformation with FME
Safe Software
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
High Availability On-Premises FME Flow.pdf
High Availability On-Premises FME Flow.pdf
Safe Software
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
Kubernetes Security Act Now Before It’s Too Late
Kubernetes Security Act Now Before It’s Too Late
Michael Furman
 
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
Muhammad Rizwan Akram
 
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Safe Software
 
The State of Web3 Industry- Industry Report
The State of Web3 Industry- Industry Report
Liveplex
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Alliance
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
Murdledescargadarkweb.pdfvolumen1 100 elementary
Murdledescargadarkweb.pdfvolumen1 100 elementary
JorgeSemperteguiMont
 
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Alliance
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
Supporting the NextGen 911 Digital Transformation with FME
Supporting the NextGen 911 Digital Transformation with FME
Safe Software
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
High Availability On-Premises FME Flow.pdf
High Availability On-Premises FME Flow.pdf
Safe Software
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
Kubernetes Security Act Now Before It’s Too Late
Kubernetes Security Act Now Before It’s Too Late
Michael Furman
 
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
Muhammad Rizwan Akram
 
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Safe Software
 
The State of Web3 Industry- Industry Report
The State of Web3 Industry- Industry Report
Liveplex
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Alliance
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 

Distributed Deep Learning with Docker at Salesforce

  • 1. Distributed Deep Learning with Docker at Salesforce
  • 3. Caveats ● These my own views and opinions, not those of Salesforce ● This is how one team at Salesforce deploys deep learning models ● When I use the term Docker I am referring to the technology, not the company ● Some of these designs are simplified
  • 4. ● What is deep learning and why is it difficult? ● Deep learning at Salesforce ● Challenges ○ Designing for team specialization ○ Interacting with the model server ○ Testing ● Key takeaways About this talk
  • 5. The core task of deep learning is function approximation. Neural networks can approximate any function. Neural networks are expensive to evaluate. ● Linear regression: ~1,000 parameters ● Deep neural network: 100M - 1B parameters (100,000 - 1M x linear reg.) Deep Learning Review
  • 6. How should we design distributed systems for deep learning? high latency tasks
  • 7. We use deep learning models to provide our customers useful information about their sales process. They send us this data as a firehose of streaming data. The faster we get this data to our customers, the more useful and actionable it is for their sales teams. Deep Learning at Salesforce
  • 8. There are three steps to this process 1. Preprocessing - cleaning and formatting the data 2. Inference - running the data through the model 3. Postprocessing - interpreting the output from the model Deep Learning at Salesforce
  • 9. Deep Learning at Salesforce Discusses cat preprocess [0.2, 0.71, 0.89, 0.6] [0.85, 0.15] inference postprocess“Hello! My cat is friendly.”
  • 10. Deep Learning at Salesforce Discusses cat preprocess [0.2, 0.71, 0.89, 0.6] [0.85, 0.15] inference postprocess“Hello! My cat is friendly.” Data Science Team
  • 11. Deep Learning at Salesforce Discusses cat preprocess [0.2, 0.71, 0.89, 0.6] [0.85, 0.15] inference postprocess“Hello! My cat is friendly.” Data Science Team Systems Team
  • 13. Requirements 1. The data science team shouldn’t need to know about the system. They just want to define a sequence of computation. 2. The systems engineers shouldn’t need to know anything about the computation. They just want to scale the system. Designing for team specialization
  • 14. Challenges 1. Some functions takes longer to execute than others (e.g., model inference) 2. The order of execution is important Designing for team specialization
  • 15. Solution: map functions to containers Designing for team specialization postprocess(inference(preprocess(x))) preprocess inference postprocess
  • 16. What about throughput? preprocess inference postprocess 0110010 1001111 0110010 1000011 It’s a cat! 1,000 QPS 500 QPS 300 QPS 1,000 QPS Max throughput
  • 17. inference inferenceinferencepreprocess What about throughput? preprocess inference postprocess 0110010 1001111 0110010 1000011 It’s a cat! 1,000 QPS 2x 500 QPS 4x 300 QPS 1,000 QPS Max throughput Docker enables us to easily scale out each individual stage
  • 18. inferenceinferencepreprocess What about throughput? preprocess inference postprocess 0110010 1001111 0110010 1000011 It’s a cat! 1,000 QPS 2x 500 QPS 2x 300 QPS 1,000 QPS Max throughput Kafka gives stage-wise checkpointing
  • 20. Model servers provide a way to query the model, typically via gRPC or HTTP. What is the best way to deploy and interact with these model servers? Serving deep learning models
  • 21. Challenge: 1. Model servers are designed as a standalone process. 2. How should we best utilize multiple GPUs? 3. What about networking? Interacting with the model server We want to keep deployment simple!
  • 22. Solution: Deploy model server images as part of a “pod” or “group” with a coordinator service Interacting with the model server JVM Manager Model Server Model Server Model Server... Pod
  • 23. 1. Who owns the model server? 2. How should we handle model versions? Where are they stored locally? 3. What are the addresses of the model servers? This solves additional challenges Data science team Docker shared volume http://localhost via Docker private networking
  • 25. Challenge: how should we test these systems? 1. Deep learning models are probabilistic 2. Interservice interactions can be quite complex Testing
  • 26. Solution: Docker Compose ● Makes it easy to swap out the model server with a mock service ● Deploying the entire system locally is easy ● Integrates well with Maven and Gradle Testing
  • 27. We haven’t spent a lot of time discussing the details of Docker That is precisely the point!
  • 28. ● Docker allows us to simplify many aspects of our design. ● Docker stays out of the way. ● Docker provides a simple alternative to a much more complex solution. Docker simplifies our lives