SlideShare a Scribd company logo
Apple logo is a trademark of Apple Inc.
Holden Karau | Data / AI Summi
t

@holdenkara
u

Improving Spark for Dynamic
Allocation & Spot Instances
Who am I?
• Holden Kara
u

• She / he
r

• Apache Spark PMC
• Contributor to a lot of other projects
• co-author of High Performance
Spark, Learning Spark, and Kubeflow
for Machine Learning
• http://bit.ly/holdenSparkVideos
• https://youtube.com/user/holdenkarau
Apple logo is a trademark of Apple Inc.
Let us start at the beginning
• Spark achieves resilience through re-computation which is part of how we go fas
• This poses challenges with removing executors that may contain dat
• We "solved" it for YARN/Mesos back in the da
• I drank waaaay too much coffee and came up with an alternativ
• But no one really liked it because we didn't need it so I closed the Google doc and
forgot about i
t

• Don’t worry, we’ll get to the code soon :)
But then….
• The "cloud" became really popula
r

• Kubernetes became popula
r

• Everything caught on fire :/
Our Protagonist Remembers
• I started drinking a lot of coffee
 

• We dusted off that old design and wrote
some cod
e

• And then I got hit by a ca
r

• More people wrote more cod
e

• We had a VOT
E

• We wrote waaaaay more cod
e

• Everyone lived happily ever after?
Photo by Lukas from Pexels
How did DA work on YARN?
• Scale up is "easy" (add more
resources
)

• Scale down required a stay resident
program to be on each YARN node to
serve any file
s

• Spark stored it's shuffle data as file
s

• Persist in memory data was still lost
when scaling down an executor
Photo by Markus Spiske from Pexels
Why did the cloud impact this?
• If you wanted a ~50% cost saving of
spot/preemptible instances you might
lose entire machine
s

• Yes Spark can "handle" this, but does
so by recomputing data (expensive
)

• You can't depend on leaving a program
around to serve files when the server is
just gon
e

• So we need to find a way to migrate the
data
Ok sure the cloud, but K8s?
• Kubernetes doesn't like like the idea of
scheduling a stay resident program on
every nod
e

• Also most people don't like the idea of
shared disk here either (accros jobs/
users
)

• So we need to find a way to migrate the
data
SPARK-20624
• Yee-haw
!

• Ok but more seriously how does it work? Great question lets open up the code
• BlockManagerDecomissioner.scala is where most of the magic happens
Collaboration
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-
Decommissioning-SPIP-td29701.htm
l

https://github.com/apache/spark/pulls?q=is%3Apr+decommission+is%3Aclosed+
Ok what about the car?
Getting hit by a car sucks a lot
Slowed down dev work while I did rehab to be able
to walk & type again
Shout out to everyone who helped me recover
(from my wife, girlfriend, partners, my friends, to
the hospital staff, nursing home, PT, OT,
Ambulance, my employer for giving me time off,
the Spark community for understanding I needed
time off <3)
It’s early though so please be careful
On a Happy Note: You can try this now
• Enable the followin
g

- spark.decommission.enabled


- spark.storage.decommission.enabled


- spark.storage.decommission.rddBlocks.enabled
- spark.storage.decommission.shuffleBlocks.enabled
• Want to get fancy? Optionally enable:


- spark.shuffle.externalStorage.enabled


- And configure a storage backend ( spark.shuffle.externalStorage.backend)
Future work
• Heuristics to migrate dat
a

• Improve container pre-emption selectio
• Better heuristics around when to scale up and down containers
Please review this talk :)
TM and © 2021 Apple Inc. All rights reserved.
Ad

Recommended

Best Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale Platforms
Databricks
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Databricks
 
MLflow with Databricks
MLflow with Databricks
Liangjun Jiang
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Databricks
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
Timothy Spann
 
Flink vs. Spark
Flink vs. Spark
Slim Baltagi
 
Vectorized Query Execution in Apache Spark at Facebook
Vectorized Query Execution in Apache Spark at Facebook
Databricks
 
Real-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache Pinot
Xiang Fu
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta Lake
Databricks
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
Databricks
 
Improving Apache Spark Downscaling
Improving Apache Spark Downscaling
Databricks
 
Introduction to Structured Streaming
Introduction to Structured Streaming
Knoldus Inc.
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
Druid deep dive
Druid deep dive
Kashif Khan
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They Work
Ilya Ganelin
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
Databricks
 
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
Seongyun Byeon
 
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유
Hyojun Jeon
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
Databricks
 
Apache Flink and Apache Hudi.pdf
Apache Flink and Apache Hudi.pdf
dogma28
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
Slim Baltagi
 
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Databricks
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Cloudera, Inc.
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
Databricks
 
Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark Pipelines
Rose Toomey
 
Leveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelines
Rose Toomey
 

More Related Content

What's hot (20)

Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta Lake
Databricks
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
Databricks
 
Improving Apache Spark Downscaling
Improving Apache Spark Downscaling
Databricks
 
Introduction to Structured Streaming
Introduction to Structured Streaming
Knoldus Inc.
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
Druid deep dive
Druid deep dive
Kashif Khan
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They Work
Ilya Ganelin
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
Databricks
 
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
Seongyun Byeon
 
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유
Hyojun Jeon
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
Databricks
 
Apache Flink and Apache Hudi.pdf
Apache Flink and Apache Hudi.pdf
dogma28
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
Slim Baltagi
 
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Databricks
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Cloudera, Inc.
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
Databricks
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta Lake
Databricks
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
Databricks
 
Improving Apache Spark Downscaling
Improving Apache Spark Downscaling
Databricks
 
Introduction to Structured Streaming
Introduction to Structured Streaming
Knoldus Inc.
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They Work
Ilya Ganelin
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
Databricks
 
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
Seongyun Byeon
 
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유
Hyojun Jeon
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
Databricks
 
Apache Flink and Apache Hudi.pdf
Apache Flink and Apache Hudi.pdf
dogma28
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
Slim Baltagi
 
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Databricks
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Cloudera, Inc.
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
Databricks
 

Similar to Improving Apache Spark for Dynamic Allocation and Spot Instances (20)

Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark Pipelines
Rose Toomey
 
Leveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelines
Rose Toomey
 
Deploying Apache Spark Jobs on Kubernetes with Helm and Spark Operator
Deploying Apache Spark Jobs on Kubernetes with Helm and Spark Operator
Databricks
 
Stackato v4
Stackato v4
Jonas Brømsø
 
Sharing (or stealing) the jewels of python with big data &amp; the jvm (1)
Sharing (or stealing) the jewels of python with big data &amp; the jvm (1)
Holden Karau
 
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
confluent
 
Stackato v6
Stackato v6
Jonas Brømsø
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Sarah Guido
 
Machine learning in real-time - the next frontier
Machine learning in real-time - the next frontier
Snowplow Analytics
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
Amanda Casari
 
spark
spark
Ben Liu
 
Stackato
Stackato
Jonas Brømsø
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+Alluxio
Alluxio, Inc.
 
Stackato v3
Stackato v3
Jonas Brømsø
 
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Databricks
 
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
Hyderabad Scalability Meetup
 
Dec6 meetup spark presentation
Dec6 meetup spark presentation
Ramesh Mudunuri
 
LanceShivnathHadoopSummit2015
LanceShivnathHadoopSummit2015
Lance Co Ting Keh
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
Konstantin Gredeskoul
 
Stackato v5
Stackato v5
Jonas Brømsø
 
Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark Pipelines
Rose Toomey
 
Leveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelines
Rose Toomey
 
Deploying Apache Spark Jobs on Kubernetes with Helm and Spark Operator
Deploying Apache Spark Jobs on Kubernetes with Helm and Spark Operator
Databricks
 
Sharing (or stealing) the jewels of python with big data &amp; the jvm (1)
Sharing (or stealing) the jewels of python with big data &amp; the jvm (1)
Holden Karau
 
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
confluent
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Sarah Guido
 
Machine learning in real-time - the next frontier
Machine learning in real-time - the next frontier
Snowplow Analytics
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
Amanda Casari
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+Alluxio
Alluxio, Inc.
 
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Databricks
 
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
Hyderabad Scalability Meetup
 
Dec6 meetup spark presentation
Dec6 meetup spark presentation
Ramesh Mudunuri
 
LanceShivnathHadoopSummit2015
LanceShivnathHadoopSummit2015
Lance Co Ting Keh
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
Konstantin Gredeskoul
 
Ad

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
Databricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
Databricks
 
Ad

Recently uploaded (20)

Allotted-MBBS-Student-list-batch-2021.pdf
Allotted-MBBS-Student-list-batch-2021.pdf
subhansaifi0603
 
Residential Zone 4 for industrial village
Residential Zone 4 for industrial village
MdYasinArafat13
 
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
CristineGraceAcuyan
 
Communication_Skills_Class10_Visual.pptx
Communication_Skills_Class10_Visual.pptx
namanrastogi70555
 
@Reset-Password.pptx presentakh;kenvtion
@Reset-Password.pptx presentakh;kenvtion
MarkLariosa1
 
最新版意大利米兰大学毕业证(UNIMI毕业证书)原版定制
最新版意大利米兰大学毕业证(UNIMI毕业证书)原版定制
taqyea
 
Shifting Focus on AI: How it Can Make a Positive Difference
Shifting Focus on AI: How it Can Make a Positive Difference
1508 A/S
 
最新版美国约翰霍普金斯大学毕业证(JHU毕业证书)原版定制
最新版美国约翰霍普金斯大学毕业证(JHU毕业证书)原版定制
Taqyea
 
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
Taqyea
 
lecture12.pdf Introduction to bioinformatics
lecture12.pdf Introduction to bioinformatics
SergeyTsygankov6
 
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
taqyed
 
Attendance Presentation Project Excel.pptx
Attendance Presentation Project Excel.pptx
s2025266191
 
Presentation by Tariq & Mohammed (1).pptx
Presentation by Tariq & Mohammed (1).pptx
AbooddSandoqaa
 
最新版美国威斯康星大学河城分校毕业证(UWRF毕业证书)原版定制
最新版美国威斯康星大学河城分校毕业证(UWRF毕业证书)原版定制
taqyea
 
PPT2 W1L2.pptx.........................................
PPT2 W1L2.pptx.........................................
palicteronalyn26
 
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
gun3awan88
 
英国毕业证范本利物浦约翰摩尔斯大学成绩单底纹防伪LJMU学生证办理学历认证
英国毕业证范本利物浦约翰摩尔斯大学成绩单底纹防伪LJMU学生证办理学历认证
taqyed
 
Model Evaluation & Visualisation part of a series of intro modules for data ...
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
Camuflaje Tipos Características Militar 2025.ppt
Camuflaje Tipos Características Militar 2025.ppt
e58650738
 
最新版美国芝加哥大学毕业证(UChicago毕业证书)原版定制
最新版美国芝加哥大学毕业证(UChicago毕业证书)原版定制
taqyea
 
Allotted-MBBS-Student-list-batch-2021.pdf
Allotted-MBBS-Student-list-batch-2021.pdf
subhansaifi0603
 
Residential Zone 4 for industrial village
Residential Zone 4 for industrial village
MdYasinArafat13
 
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
CristineGraceAcuyan
 
Communication_Skills_Class10_Visual.pptx
Communication_Skills_Class10_Visual.pptx
namanrastogi70555
 
@Reset-Password.pptx presentakh;kenvtion
@Reset-Password.pptx presentakh;kenvtion
MarkLariosa1
 
最新版意大利米兰大学毕业证(UNIMI毕业证书)原版定制
最新版意大利米兰大学毕业证(UNIMI毕业证书)原版定制
taqyea
 
Shifting Focus on AI: How it Can Make a Positive Difference
Shifting Focus on AI: How it Can Make a Positive Difference
1508 A/S
 
最新版美国约翰霍普金斯大学毕业证(JHU毕业证书)原版定制
最新版美国约翰霍普金斯大学毕业证(JHU毕业证书)原版定制
Taqyea
 
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
Taqyea
 
lecture12.pdf Introduction to bioinformatics
lecture12.pdf Introduction to bioinformatics
SergeyTsygankov6
 
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
taqyed
 
Attendance Presentation Project Excel.pptx
Attendance Presentation Project Excel.pptx
s2025266191
 
Presentation by Tariq & Mohammed (1).pptx
Presentation by Tariq & Mohammed (1).pptx
AbooddSandoqaa
 
最新版美国威斯康星大学河城分校毕业证(UWRF毕业证书)原版定制
最新版美国威斯康星大学河城分校毕业证(UWRF毕业证书)原版定制
taqyea
 
PPT2 W1L2.pptx.........................................
PPT2 W1L2.pptx.........................................
palicteronalyn26
 
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
gun3awan88
 
英国毕业证范本利物浦约翰摩尔斯大学成绩单底纹防伪LJMU学生证办理学历认证
英国毕业证范本利物浦约翰摩尔斯大学成绩单底纹防伪LJMU学生证办理学历认证
taqyed
 
Model Evaluation & Visualisation part of a series of intro modules for data ...
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
Camuflaje Tipos Características Militar 2025.ppt
Camuflaje Tipos Características Militar 2025.ppt
e58650738
 
最新版美国芝加哥大学毕业证(UChicago毕业证书)原版定制
最新版美国芝加哥大学毕业证(UChicago毕业证书)原版定制
taqyea
 

Improving Apache Spark for Dynamic Allocation and Spot Instances

  • 1. Apple logo is a trademark of Apple Inc. Holden Karau | Data / AI Summi t @holdenkara u Improving Spark for Dynamic Allocation & Spot Instances
  • 2. Who am I? • Holden Kara u • She / he r • Apache Spark PMC • Contributor to a lot of other projects • co-author of High Performance Spark, Learning Spark, and Kubeflow for Machine Learning • http://bit.ly/holdenSparkVideos • https://youtube.com/user/holdenkarau
  • 3. Apple logo is a trademark of Apple Inc.
  • 4. Let us start at the beginning • Spark achieves resilience through re-computation which is part of how we go fas • This poses challenges with removing executors that may contain dat • We "solved" it for YARN/Mesos back in the da • I drank waaaay too much coffee and came up with an alternativ • But no one really liked it because we didn't need it so I closed the Google doc and forgot about i t • Don’t worry, we’ll get to the code soon :)
  • 5. But then…. • The "cloud" became really popula r • Kubernetes became popula r • Everything caught on fire :/
  • 6. Our Protagonist Remembers • I started drinking a lot of coffee • We dusted off that old design and wrote some cod e • And then I got hit by a ca r • More people wrote more cod e • We had a VOT E • We wrote waaaaay more cod e • Everyone lived happily ever after? Photo by Lukas from Pexels
  • 7. How did DA work on YARN? • Scale up is "easy" (add more resources ) • Scale down required a stay resident program to be on each YARN node to serve any file s • Spark stored it's shuffle data as file s • Persist in memory data was still lost when scaling down an executor Photo by Markus Spiske from Pexels
  • 8. Why did the cloud impact this? • If you wanted a ~50% cost saving of spot/preemptible instances you might lose entire machine s • Yes Spark can "handle" this, but does so by recomputing data (expensive ) • You can't depend on leaving a program around to serve files when the server is just gon e • So we need to find a way to migrate the data
  • 9. Ok sure the cloud, but K8s? • Kubernetes doesn't like like the idea of scheduling a stay resident program on every nod e • Also most people don't like the idea of shared disk here either (accros jobs/ users ) • So we need to find a way to migrate the data
  • 10. SPARK-20624 • Yee-haw ! • Ok but more seriously how does it work? Great question lets open up the code • BlockManagerDecomissioner.scala is where most of the magic happens
  • 12. Ok what about the car? Getting hit by a car sucks a lot Slowed down dev work while I did rehab to be able to walk & type again Shout out to everyone who helped me recover (from my wife, girlfriend, partners, my friends, to the hospital staff, nursing home, PT, OT, Ambulance, my employer for giving me time off, the Spark community for understanding I needed time off <3)
  • 13. It’s early though so please be careful On a Happy Note: You can try this now • Enable the followin g - spark.decommission.enabled - spark.storage.decommission.enabled - spark.storage.decommission.rddBlocks.enabled - spark.storage.decommission.shuffleBlocks.enabled • Want to get fancy? Optionally enable: - spark.shuffle.externalStorage.enabled - And configure a storage backend ( spark.shuffle.externalStorage.backend)
  • 14. Future work • Heuristics to migrate dat a • Improve container pre-emption selectio • Better heuristics around when to scale up and down containers
  • 16. TM and © 2021 Apple Inc. All rights reserved.