Scaling Apache Spark on Kubernetes at Lyft

WIFI SSID:SparkAISummit | Password: UnifiedAnalytics

Li Gao, Lyft
Rohit Menon, Lyft
Scaling Spark on
Kubernetes
#UnifiedAnalytics #SparkAISummit

Introduction
3#UnifiedAnalytics #SparkAISummit
Li Gao
Works in the Data Platform team at Lyft, currently leading the Compute Infra
initiatives including Spark on Kubernetes.
Previously at Salesforce, Fitbit, Groupon, and other startups.
Rohit Menon
Rohit Menon is a Software Engineer on the Data Platform team at Lyft. Rohit's
primary area of focus is building and scaling out the Spark and Hive Infrastructure
for ETL and Machine learning use cases.
Previously at EA, VMWare

Agenda
● Introduction of Data Landscape at Lyft
● The challenges we face
● How Apache Spark on Kubernetes can help
● Remaining work

Data Landscape
● Batch data Ingestion and ETL
● Data Streaming
● ML platforms
● Notebooks and BI tools
● Query and Visualization
● Operational Analytics
● Data Discovery & Lineage
● Workﬂow orchestration
● Cloud Platforms

Evolving Batch Architecture
6
Future2016-2017
Vendor-based
Hadoop
2017-2018
Hive on MR
Vendor Presto
Mid 2018
Hive on Tez +
Spark Adhoc
Late 2018
Spark on
Vendor GA
Early 2019
Spark on K8s
Alpha
Spark on K8s
Beta

Batch
Compute
Clusters
What batch compute is used for
7
Events
Ext Data
RDB/KV
Sys Events
IngestPipelines
AWSS3
AWSS3
HMS
Presto,HiveClient,andBITools
Analysts
Engineers
Scientists
Services

Batch Compute Challenges
9
● 3rd Party vendor dependency limitations
● Data ETL expressed solely in SQL
● Complex logic expressed in Python that hard to adopt
in SQL
● Diﬀerent dependencies and versions
● Resource load balancing for heterogeneous workloads

3rd Party Vendor Limitations
10
● Proprietary patches
● Inconsistent bootstrap
● Release schedule
● Homogeneous environments

Is SQL the complete solution?
11

What about Python functions?
12
“I want to express my processing logic in python functions
with external geo libraries (i.e. Geomesa) and interact with
Hive tables” --- Lyft data engineer

How Spark can help?
13
RDB/KV
Applications
APIs
Environments
Data Sources
and Data
Sinks

What challenges remain?
14
● Per job custom dependencies
● Handling version requirements (Py3 v.s. Py2)
● Still need to run on shared clusters for cost eﬃciency

What about dependencies?
15
RTree Libraries
Data CodecsSpatial Libraries

Different Spark or Hive versions?
● Legacy jobs that require Spark 2.2
● Newer Jobs require Spark 2.3 or Spark 2.4
● Hive 2.1 SQL and Hive 2.3
16

How Kubernetes can help?
17
Operators &
Controllers
Pods Ingress Services
Namespaces
Pods
Immutability
Event driven &
Declarative
Community + CNCF
ServiceMesh
Multi-TenancySupport

What challenges still remain?
● Spark on k8s is still in its early days
● Single cluster scaling limit
● CRD and control plane update
● Pod churn and IP allocations throttling
● ECR container registry reliability
19

Current scale
20
● 10s PB data lake
● (O) 100k batch jobs running daily
● ~ 1000s of EC2 nodes spanning multiple
clusters and AZs
● ~ 1000s of workﬂows running daily

How Lyft scales Spark on K8s
21
# of Clusters # of Namespaces
# of Pods
Pod Churn Rate
# of Nodes
Pod Size
Job:Pod ratio IP Alloc Rate Limit
ECR Rate Limit
Affinity & Isolation
QoS & Quota

HA in Cluster Pool
24
Cluster 1
Cluster 2
Cluster 3
Cluster Pool A
Cluster 4
● Cluster rotation within a cluster pool
● Automated provisioning of a new cluster and (manually) add into rotation
● Throttle at lower bound when rotation in progress

Multiple Namespaces (Groups)
25
Pod Pod Pod
Namespace 1
Pod Pod Pod
Namespace 2
Pod Pod Pod
Namespace 3
Node A Node B Node C Node D
Role1 Role1 Role2
Max Pod Size 1 Max Pod Size 2
● Practical ~3K active pods per namespace observed
● Less preemption required when namespace isolated by quota
● Different namespaces can map different IAM roles and sidecar
configurations

Pod Sharing
26
Job
Controller Spark Driver
Pod
Spark Exec
Pods
Job 2 Driver
Pod
Job 2 Exec
Pods
Job 3 Driver
Pod
Job 3 Exec
Pods
Shared Pods
Job 1
Job 4
Job 3
Job 2
AWS
S3
Dep
Dep
Dedicate & Isolated Pods
Dep

DDL Separation to reduce churn
28

Pod Priority and Preemption (WIP)
29
● Priority base
preemption
● Driver pod has higher
priority than executor
pod
● Experimental
D1 D2 E1 E2 E3 E4
K8s Scheduler
D1
E5
New Pod Req
Before
D2 E5 E2 E3 E4
After
E1
Evictedhttps://github.com/kubernetes/kubernetes/issues/71486
https://github.com/kubernetes/enhancements/issues/564

Taints and Tolerations (WIP)
30
Node A Node B Node C Node D Node E Node F
P1 P2 P3 P4 P5 P6 P7 P7 P8 P9 P10
Controllers and Watchers Job 1 Job 2
Core Nodes (Taint) Worker Nodes (Taint)
● Other considerations: Node Labels, Node Selectors to separate GPU and CPU based
workloads

What about ECR reliability?
31
Node 1 Node 2 Node 3
Pods Pods Pods
DaemonSet + Docker In Docker
ECR Container Images

Spark Job Config Overlays (DML)
32
Cluster Pool Defaults
Cluster Defaults
Spark Job User Specified Config
Cluster and Namespace Overrides
Final Spark Job Config
Config
Composer
&
Event
Watcher
Spark
Operator

Controllers & Watchers
• Job router + scheduler
• Namespace group controller
• Config composer
• Service controllers (STS, Jupyter/Zeppelin)
• K8s metrics & events watchers
• Spark job/crd events & metrics watchers
34

Monitoring and Logging Toolbox
36
HEKA
JMX

Monitoring Example - OOM Kill
37

Provision & Automation
38
Kustomize Template
K8S Deploy
Sidecar injectors
Secrets injectors
DaemonSets
KIAM

Remaining work
● More intelligent & resilient job routing/scheduler and
parameter setting
● Serverless and self-serviceable user experiences for
any-to-any batch data compute
● Finer grained cost attribution
● Improved docker image distribution
● Spark 3.0 & Kubernetes v1.14+
39

Key Takeaways
● Apache Spark can help unify diﬀerent batch data compute
use cases
● Kubernetes can help solve the dependency and multi-version
requirements using its containerized approach
● Spark on Kubernetes can scale signiﬁcantly by using a
multi-cluster compute mesh approach with proper resource
isolation and scheduling techniques
● Challenges remain when running Spark on Kubernetes at
scale
40

Community
41
This eﬀort would not be possible
without the help from the open
source and wider communities:

Q&A
42
Li Gao in/ligao101
Rohit Menon @_rohitmenon

DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

Scaling Apache Spark on Kubernetes at Lyft

More Related Content

What's hot (20)

Similar to Scaling Apache Spark on Kubernetes at Lyft (20)

More from Databricks (20)

Recently uploaded (20)

Scaling Apache Spark on Kubernetes at Lyft