Distributed Deep Learning with Docker at Salesforce

Distributed Deep
Learning with Docker
at Salesforce

Jeﬀ Hajewski
Software Engineer,
Salesforce.
github.com/j-haj
jeﬀ-hajewski-3a1b5a29

Caveats
● These my own views and opinions, not those of Salesforce
● This is how one team at Salesforce deploys deep learning
models
● When I use the term Docker I am referring to the
technology, not the company
● Some of these designs are simpliﬁed

● What is deep learning and why is it diﬃcult?
● Deep learning at Salesforce
● Challenges
○ Designing for team specialization
○ Interacting with the model server
○ Testing
● Key takeaways
About this talk

The core task of deep learning is function approximation.
Neural networks can approximate any function.
Neural networks are expensive to evaluate.
● Linear regression: ~1,000 parameters
● Deep neural network: 100M - 1B parameters (100,000 - 1M x linear reg.)
Deep Learning Review

How should we design
distributed systems
for deep learning?
high latency tasks

We use deep learning models to provide our customers
useful information about their sales process.
They send us this data as a ﬁrehose of streaming data.
The faster we get this data to our customers, the more
useful and actionable it is for their sales teams.
Deep Learning at Salesforce

There are three steps to this process
1. Preprocessing - cleaning and formatting the data
2. Inference - running the data through the model
3. Postprocessing - interpreting the output from the model

Discusses
cat
preprocess
[0.2, 0.71, 0.89, 0.6]
[0.85, 0.15]
inference
postprocess“Hello! My
cat is
friendly.”

Discusses
cat
preprocess
[0.2, 0.71, 0.89, 0.6]
[0.85, 0.15]
inference
cat is
friendly.”
Data Science Team

Discusses
cat
preprocess
[0.2, 0.71, 0.89, 0.6]
[0.85, 0.15]
inference
cat is
friendly.”
Data Science Team Systems Team

Challenge 1:
designing
for team
specialization

Requirements
1. The data science team shouldn’t need to know
about the system. They just want to deﬁne a
sequence of computation.
2. The systems engineers shouldn’t need to know
anything about the computation. They just want to
scale the system.
Designing for team specialization

Challenges
1. Some functions takes longer to execute than others
(e.g., model inference)
2. The order of execution is important

Solution: map functions to containers
postprocess(inference(preprocess(x)))
preprocess inference postprocess

What about throughput?
0110010
1001111
0110010
1000011
It’s a cat!
1,000
QPS
500
QPS
300
QPS
1,000
QPS
Max
throughput

inference
inferenceinferencepreprocess
0110010
1001111
0110010
1000011
It’s a cat!
1,000
QPS
2x
500
QPS
4x
300
QPS
1,000
QPS
Max
throughput
Docker enables us to easily scale out each individual stage

inferenceinferencepreprocess
0110010
1001111
0110010
1000011
It’s a cat!
1,000
QPS
2x
500
QPS
2x
300
QPS
1,000
QPS
Max
throughput
Kafka gives stage-wise checkpointing

Challenge 2:
interacting
with the
model servers

Model servers provide a way to query the model,
typically via gRPC or HTTP.
What is the best way to deploy and interact with these
model servers?
Serving deep learning models

Challenge:
1. Model servers are designed as a standalone process.
2. How should we best utilize multiple GPUs?
3. What about networking?
Interacting with the model server
We want to keep deployment simple!

Solution: Deploy model server images as part of a “pod” or
“group” with a coordinator service
Interacting with the model server
JVM
Manager
Model Server Model Server Model Server...
Pod

1. Who owns the model server?
2. How should we handle model versions? Where are they
stored locally?
3. What are the addresses of the model servers?
This solves additional challenges
Data science team
Docker shared volume
http://localhost via Docker private networking

Challenge: how should we test these systems?
1. Deep learning models are probabilistic
2. Interservice interactions can be quite complex
Testing

Solution: Docker Compose
● Makes it easy to swap out the model server with a mock
service
● Deploying the entire system locally is easy
● Integrates well with Maven and Gradle
Testing

We haven’t spent a lot of time
discussing the details of Docker
That is precisely the point!

● Docker allows us to simplify many aspects of our design.
● Docker stays out of the way.
● Docker provides a simple alternative to a much more
complex solution.
Docker simpliﬁes our lives

Distributed Deep Learning with Docker at Salesforce

Recommended

More Related Content

What's hot (20)

Similar to Distributed Deep Learning with Docker at Salesforce (20)

More from Docker, Inc. (20)

Recently uploaded (20)

Distributed Deep Learning with Docker at Salesforce