SlideShare a Scribd company logo
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
Powering Custom Apps at Facebook
using Spark Script Transformation
Abdulrahman Alfozan
Spark Summit Europe
Agenda
1. Intro to Spark Script Transforms
2. Spark Transforms at Facebook
3. Core Engine Improvements
4. Efficiency Analysis and Results
5. Transforms Execution Model
6. Future Plans
2015
Small Scale
Experiments
2016
Few Pipelines in
Production
2017
Running 60TB+
shuffle pipelines
2018
Full-production
deployment
Successor to Apache
Hive at Facebook
2019
Scaling Spark
Largest Compute
Engine at Facebook
by CPU
Spark at Facebook
Reliability and efficiency are our top priority
Agenda
1. Intro to Spark Script Transforms
2. Spark Transforms at Facebook
3. Core Engine Improvements
4. Efficiency Analysis and Results
5. Transforms Execution Model
6. Future Plans
Script Transforms
SQL query
SELECT
TRANSFORM (inputs)
USING “script”
AS (outputs)
FROM src_tbl;
Script Transforms
SQL query
ScriptTransformation (inputs,
script, outputs)
TableScan (src_tbl)
SELECT
TRANSFORM (inputs)
USING “script”
AS (outputs)
FROM src_tbl;
Query plan
Script Transforms
ScriptTransformation (inputs,
script, outputs)
TableScan (src_tbl)
SQL query Query plan
Spark External
Process
Input Table
Output Table
inputs
outputs
Execution
SELECT
TRANSFORM (inputs)
USING “script”
AS (outputs)
FROM src_tbl;
1. Flexibility:
Unlike UDFs, transforms allow unlimited use-cases
2. Efficiency:
Most transformers are written in C++
Why Script Transforms?
1. Flexibility:
Unlike UDFs, transforms allow unlimited use-cases
2. Efficiency:
Most transformers are written in C++
Why Script Transforms?
Transforms provide custom data processing while relying on Spark for
ETL, data partitioning, distributed execution, and fault-tolerance.
1. Flexibility:
Unlike UDFs, transforms allow unlimited use-cases
2. Efficiency:
Most transformers are written in C++
Why Script Transforms?
Transforms provide custom data processing while relying on Spark for
ETL, data partitioning, distributed execution, and fault-tolerance.
e.g. Spark is optimized for ETL. PyTorch is optimized for model serving.
Agenda
1. Intro to Spark Script Transforms
2. Spark Transforms at Facebook
3. Core Engine Improvements
4. Efficiency Analysis and Results
5. Transforms Execution Model
6. Future plans
Transform Pipelines Usage
% of overall CPU
15%
12%
9%
6%
3%
0%
Pure SQL (54%)
Pure SQL (72%)
Transforms & UDFs
(45%)
Transforms & UDFs
(20%)
DataFrames (1%)
DataFrames (8%)
Count CPU
Transform Pipelines Usage
Query Count CPU
Comparison
Use-case 1: Batch Inference
SQL Query
Transform resourcesADD FILES inference_engine, model.md;
SELECT
TRANSFORM (id INT, metadata STRING, image STRING)
ROW FORMAT SERDE 'JSONSimpleSerDe'
USING ‘inference_engine --model=model.md’
AS labels MAP<STRING, DOUBLE>
ROW FORMAT SERDE 'JSONSimpleSerDe'
FROM tlb_images;
Output: category>confidence
Input columns
Input format
Output format
Use-case 1: Batch Inference
Transform main.cpp
#include ”spark/Transformer.h”
...
while (transformer.readRow(input)) {
// data processing
auto prediction = predict(input)
// write output map
transformer.writeRow(prediction)
}
Transform lib
Row iterator
Use-case 1: Batch Inference
PyTorch runtime container
Self-contained Executable
Spark Executor
Transform Process
stdin
stdout
Spark Task
InternalRow
Serialization into JSON
JSON deserialization
into InternalRow
JSON deserialization into
C++ objects
C++ objects
serialization into JSON
{id:1, metadata:, image:…}
{label_1: score, label_2: score}
Model
Use-case 2: Batch Indexing
SQL Query
Transform resourcesADD FILES indexer;
SELECT
TRANSFORM (shard_id INT, data STRING)
ROW FORMAT SERDE ‘RowFormatDelimited‘
USING ‘indexer --schema=data<STRING>’
FROM src_tbl
CLUSTER BY shard_id;
Input columns
Input format
Partition operator
Spark Task
Spark Task
Use-case 2: Batch Indexing
shard_id data
1 {…}
1 {…}
2 {…}
shard_id data
1 {…}
2 {…}
2 {…}
shard_id data
1 {…}
1 {…}
1 {…}
shard_id data
2 {…}
2 {…}
2 {…}
Mappers
Reducer Transforms
Shuffle Reducer 1
Mapper 1
Mapper 2
Transform Process
stdin
indexer
…
Reducer 2
Execution
Agenda
1. Intro to Spark Script Transforms
2. Spark Transforms at Facebook
3. Core Engine Improvements
4. Efficiency Analysis and Results
5. Transforms Execution Model
6. Future Plans
ScriptTransformationExec.scala
• Direct process invocation
• Class IOSchema to handle SerDe schema and config
• MonitorThread to track transform process progress
• Transform process error handling and surfacing
Core Engine Improvements
Operator
• DelimitedJSONSerDe.scala
JSON format standard RFC 8259
Core Engine Improvements
SerDe support
• SimpleSerDe.scala
ROW FORMAT DELIMITED
Core Engine Improvements
SerDe support
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY ':'
LINES TERMINATED BY 'n'
Configurable properties
Core Engine Improvements
SerDe support
Development
• Text-based
DelimitedJSONSerDe.scala
SimpleSerDe.scala
• Binary
?
Production
• Binary format
Text-based encoding is slow and less-compact
Core Engine Improvements
Production SerDe Requirements
• Binary format
Text-based encoding is slow and less-compact
• Zero-copy
Access to serialized data without parsing or unpacking
Improving Facebook’s performance on Android with FlatBuffers
Core Engine Improvements
Production SerDe Requirements
• Binary format
Text-based encoding is slow and less-compact
• Zero-copy
Access to serialized data without parsing or unpacking
Improving Facebook’s performance on Android with FlatBuffers
• Word-aligned data
Allow for SIMD optimizations
Core Engine Improvements
Production SerDe Requirements
• LazyBinarySerDe (Apache Hive)
Not zero-copy nor word-aligned, require converters in Spark
• Protocol Buffers / Thrift
Not zero-copy, more suited for RPC
• Flatbuffers / Cap’n Proto
require converters (to/from InternalRow) in Spark Core
• Apache Arrow
great future option
Binary SerDe Considerations
UnsafeRow
• Binary & Word-aligned
• Zero-copy
• Already part of Spark core
• Available converters to/from InternalRow
Binary SerDe Considerations
Chosen format
UnsafeRow SerDe
SPARK-7076: Introduced UnsafeRow format to Spark
apache/spark/sql/catalyst/expressions/UnsafeRow.java
UnsafeRow SerDe
SPARK-15962: Introduced UnsafeArrayData and UnsafeMapData
apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
apache/spark/sql/catalyst/expressions/UnsafeMapData.java
UnsafeRow SerDe
UnsafeRow SerDe C++ library
INT
BIGINT
BOOLEAN
FLOAT
DOUBLE
STRING
ARRAY<INT>
MAP<INT,STRING>
int32_t
int64_t
bool
float
double
unsaferow::String
unsaferow::List<int32_t>
unsaferow::Map<int32_t, unsaferow::String>
SQL datatypes C++ datatypes
UnsafeRow SerDe
UnsafeRow SerDe C++ library
SELECT
TRANSFORM (id INT)
ROW FORMAT SERDE 'UnsafeRowSerDe'
USING ‘script’
AS (value BIGINT)
ROW FORMAT SERDE UnsafeRowSerDe'
FROM src_tbl;
#include ”spark/Transformer.h”
while (transformer.readRow(input)) {
// data processing
int32_t id = input->getID();
output->setValue(id*id);
// write output
transformer.writeRow(output)
}
SQL Query C++ Transformer
Core Engine Improvements
SerDe support summary
Development
Production
• Text-based
DelimitedJSONSerDe.scala
SimpleSerDe.scala
• Binary
UnsafeRowSerDe.scala
Core Engine Improvements
SELECT
TRANSFORM (id, AVG(value) AS value_avg)
USING ‘script’
AS (output)
FROM src_tbl;
GROUP BY id;
Aggregation and projection support (SQL)
Agenda
1. Intro to Spark Script Transforms
2. Spark Transforms at Facebook
3. Core Engine Improvements
4. Efficiency Analysis and Results
5. Transforms Execution Model
6. Future Plans
• Text-based (UTF-8)
- JSON
- Row Format Delimited
• Binary:
- UnsafeRow
Efficiency Analysis
SerDe overhead
JSON lib
Efficiency Analysis
Text-SerDe CPU overhead: Spark
Efficiency Analysis
Text-SerDe CPU overhead: Transform process
• Text-based SerDe overhead is non-negligible
especially for Complex types
• SerDe cost could be up to 70% of pipeline’s CPU resources
Efficiency Analysis
SerDe overhead
• Text-based SerDe overhead is non-negligible
especially for Complex types
• SerDe cost could be up to 70% of pipeline’s CPU resources
Solution: use an efficient binary SerDe
Efficiency Analysis
SerDe overhead
Efficiency Analysis: UnsafeRow
Efficient Binary SerDe
UnsafeRow
C++ lib
Efficiency Analysis: UnsafeRow
Spark
Efficiency Analysis: UnsafeRow
Transform process
UnsafeRow SerDe Benchmark
Text-Based SerDe vs Binary
SerDe (UnsafeRow)
Transform pipelines end-
to-end CPU savings:
up to 4x
Complex types SerDe
impacted the most
Agenda
1. Intro to Spark Script Transforms
2. Spark Transforms at Facebook
3. Core Engine Improvements
4. Efficiency Analysis and Results
5. Transforms Execution Model
6. Future Plans
CPU cores per container: spark.executor.cores = 4
Memory per container: spark.executor.memory=4GB + spark.transform.memory=4GB
Transforms Execution Model
Resource Request
Spark
Driver
Cluster
Manager
Node Manager
Node Manager
Executor
Task 1 Task 2
Resource Request
CPU cores = 4,
Memory = 8GB
Launch Spark
Executor
Process 1 Process 2
Executor
• JVM’s memory limits: Xms, Xmx and Xss.
• CPU threads:
spark.executor.core,spark.task.cpus
Transforms Execution Model
Resource Control
• JVM’s memory limits: Xms, Xmx and Xss.
• CPU threads:
spark.executor.core,spark.task.cpus
These limits are irrelevant when running an
external process!
Transforms Execution Model
Resource Control
• JVM’s memory limits: Xms, Xmx and Xss.
• CPU threads:
spark.executor.core,spark.task.cpus
These limits are irrelevant when running an
external process!
Solution: cgroup v2 containers
Transforms Execution Model
Resource Control
cgroup v2 controllers:
• cpu.weight
Allows Multi-threaded transforms
• memory.max
OOM offending processes
• io.latency
IO QoS
Transforms Execution Model
Resource Control & Isolation
Transforms Execution Model
Resource Control & Isolation
/cgroup2/task_container/exec1
Agenda
1. Intro to Spark Script Transforms
2. Spark Transforms at Facebook
3. Core Engine Improvements
4. Efficiency Analysis and Results
5. Transforms Execution Model
6. Future Plans
• Binary SerDe based on Apache arrow
• Vectorization
Future Plans
Questions
INFRASTRUCTURE

More Related Content

PDF
XStream: stream processing platform at facebook
PDF
A Deep Dive into Query Execution Engine of Spark SQL
PDF
Lessons in Linear Algebra at Scale with Apache Spark : Let's Make the Sparse ...
PDF
From Zero to Hero with Kafka Connect
PDF
Social Media Monitoring with NiFi, Druid and Superset
PDF
Kafka Connect & Streams - the ecosystem around Kafka
PPTX
ORC improvement in Apache Spark 2.3
PDF
PySpark in practice slides
XStream: stream processing platform at facebook
A Deep Dive into Query Execution Engine of Spark SQL
Lessons in Linear Algebra at Scale with Apache Spark : Let's Make the Sparse ...
From Zero to Hero with Kafka Connect
Social Media Monitoring with NiFi, Druid and Superset
Kafka Connect & Streams - the ecosystem around Kafka
ORC improvement in Apache Spark 2.3
PySpark in practice slides

Similar to Powering Custom Apps at Facebook using Spark Script Transformation (20)

PPTX
Seattle Spark Meetup Mobius CSharp API
PDF
MLeap: Release Spark ML Pipelines
PPTX
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
PPTX
Strata NY 2017 Parquet Arrow roadmap
PDF
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
PPTX
Spark ML Pipeline serving
PDF
Productionizing Machine Learning - Bigdata meetup 5-06-2019
PDF
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
PPTX
The columnar roadmap: Apache Parquet and Apache Arrow
PDF
BDM25 - Spark runtime internal
PDF
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
PPTX
Deploying Data Science Engines to Production
PDF
Spark Development Lifecycle at Workday - ApacheCon 2020
PDF
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
PDF
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
PDF
Practices and tools for building better APIs
PDF
Practices and tools for building better API (JFall 2013)
PPTX
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
PPTX
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
PPTX
H2O open source sparkling water introduction and deep dive
Seattle Spark Meetup Mobius CSharp API
MLeap: Release Spark ML Pipelines
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Strata NY 2017 Parquet Arrow roadmap
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Spark ML Pipeline serving
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
The columnar roadmap: Apache Parquet and Apache Arrow
BDM25 - Spark runtime internal
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Deploying Data Science Engines to Production
Spark Development Lifecycle at Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
Practices and tools for building better APIs
Practices and tools for building better API (JFall 2013)
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
H2O open source sparkling water introduction and deep dive
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake
Ad

Recently uploaded (20)

PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Foundation of Data Science unit number two notes
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
annual-report-2024-2025 original latest.
Qualitative Qantitative and Mixed Methods.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Foundation of Data Science unit number two notes
Business Ppt On Nestle.pptx huunnnhhgfvu
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Clinical guidelines as a resource for EBP(1).pdf
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Knowledge Engineering Part 1
.pdf is not working space design for the following data for the following dat...
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Miokarditis (Inflamasi pada Otot Jantung)
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
annual-report-2024-2025 original latest.

Powering Custom Apps at Facebook using Spark Script Transformation

  • 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  • 2. Powering Custom Apps at Facebook using Spark Script Transformation Abdulrahman Alfozan Spark Summit Europe
  • 3. Agenda 1. Intro to Spark Script Transforms 2. Spark Transforms at Facebook 3. Core Engine Improvements 4. Efficiency Analysis and Results 5. Transforms Execution Model 6. Future Plans
  • 4. 2015 Small Scale Experiments 2016 Few Pipelines in Production 2017 Running 60TB+ shuffle pipelines 2018 Full-production deployment Successor to Apache Hive at Facebook 2019 Scaling Spark Largest Compute Engine at Facebook by CPU Spark at Facebook Reliability and efficiency are our top priority
  • 5. Agenda 1. Intro to Spark Script Transforms 2. Spark Transforms at Facebook 3. Core Engine Improvements 4. Efficiency Analysis and Results 5. Transforms Execution Model 6. Future Plans
  • 6. Script Transforms SQL query SELECT TRANSFORM (inputs) USING “script” AS (outputs) FROM src_tbl;
  • 7. Script Transforms SQL query ScriptTransformation (inputs, script, outputs) TableScan (src_tbl) SELECT TRANSFORM (inputs) USING “script” AS (outputs) FROM src_tbl; Query plan
  • 8. Script Transforms ScriptTransformation (inputs, script, outputs) TableScan (src_tbl) SQL query Query plan Spark External Process Input Table Output Table inputs outputs Execution SELECT TRANSFORM (inputs) USING “script” AS (outputs) FROM src_tbl;
  • 9. 1. Flexibility: Unlike UDFs, transforms allow unlimited use-cases 2. Efficiency: Most transformers are written in C++ Why Script Transforms?
  • 10. 1. Flexibility: Unlike UDFs, transforms allow unlimited use-cases 2. Efficiency: Most transformers are written in C++ Why Script Transforms? Transforms provide custom data processing while relying on Spark for ETL, data partitioning, distributed execution, and fault-tolerance.
  • 11. 1. Flexibility: Unlike UDFs, transforms allow unlimited use-cases 2. Efficiency: Most transformers are written in C++ Why Script Transforms? Transforms provide custom data processing while relying on Spark for ETL, data partitioning, distributed execution, and fault-tolerance. e.g. Spark is optimized for ETL. PyTorch is optimized for model serving.
  • 12. Agenda 1. Intro to Spark Script Transforms 2. Spark Transforms at Facebook 3. Core Engine Improvements 4. Efficiency Analysis and Results 5. Transforms Execution Model 6. Future plans
  • 13. Transform Pipelines Usage % of overall CPU 15% 12% 9% 6% 3% 0%
  • 14. Pure SQL (54%) Pure SQL (72%) Transforms & UDFs (45%) Transforms & UDFs (20%) DataFrames (1%) DataFrames (8%) Count CPU Transform Pipelines Usage Query Count CPU Comparison
  • 15. Use-case 1: Batch Inference SQL Query Transform resourcesADD FILES inference_engine, model.md; SELECT TRANSFORM (id INT, metadata STRING, image STRING) ROW FORMAT SERDE 'JSONSimpleSerDe' USING ‘inference_engine --model=model.md’ AS labels MAP<STRING, DOUBLE> ROW FORMAT SERDE 'JSONSimpleSerDe' FROM tlb_images; Output: category>confidence Input columns Input format Output format
  • 16. Use-case 1: Batch Inference Transform main.cpp #include ”spark/Transformer.h” ... while (transformer.readRow(input)) { // data processing auto prediction = predict(input) // write output map transformer.writeRow(prediction) } Transform lib Row iterator
  • 17. Use-case 1: Batch Inference PyTorch runtime container Self-contained Executable Spark Executor Transform Process stdin stdout Spark Task InternalRow Serialization into JSON JSON deserialization into InternalRow JSON deserialization into C++ objects C++ objects serialization into JSON {id:1, metadata:, image:…} {label_1: score, label_2: score} Model
  • 18. Use-case 2: Batch Indexing SQL Query Transform resourcesADD FILES indexer; SELECT TRANSFORM (shard_id INT, data STRING) ROW FORMAT SERDE ‘RowFormatDelimited‘ USING ‘indexer --schema=data<STRING>’ FROM src_tbl CLUSTER BY shard_id; Input columns Input format Partition operator
  • 19. Spark Task Spark Task Use-case 2: Batch Indexing shard_id data 1 {…} 1 {…} 2 {…} shard_id data 1 {…} 2 {…} 2 {…} shard_id data 1 {…} 1 {…} 1 {…} shard_id data 2 {…} 2 {…} 2 {…} Mappers Reducer Transforms Shuffle Reducer 1 Mapper 1 Mapper 2 Transform Process stdin indexer … Reducer 2 Execution
  • 20. Agenda 1. Intro to Spark Script Transforms 2. Spark Transforms at Facebook 3. Core Engine Improvements 4. Efficiency Analysis and Results 5. Transforms Execution Model 6. Future Plans
  • 21. ScriptTransformationExec.scala • Direct process invocation • Class IOSchema to handle SerDe schema and config • MonitorThread to track transform process progress • Transform process error handling and surfacing Core Engine Improvements Operator
  • 22. • DelimitedJSONSerDe.scala JSON format standard RFC 8259 Core Engine Improvements SerDe support
  • 23. • SimpleSerDe.scala ROW FORMAT DELIMITED Core Engine Improvements SerDe support FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY '|' MAP KEYS TERMINATED BY ':' LINES TERMINATED BY 'n' Configurable properties
  • 24. Core Engine Improvements SerDe support Development • Text-based DelimitedJSONSerDe.scala SimpleSerDe.scala • Binary ? Production
  • 25. • Binary format Text-based encoding is slow and less-compact Core Engine Improvements Production SerDe Requirements
  • 26. • Binary format Text-based encoding is slow and less-compact • Zero-copy Access to serialized data without parsing or unpacking Improving Facebook’s performance on Android with FlatBuffers Core Engine Improvements Production SerDe Requirements
  • 27. • Binary format Text-based encoding is slow and less-compact • Zero-copy Access to serialized data without parsing or unpacking Improving Facebook’s performance on Android with FlatBuffers • Word-aligned data Allow for SIMD optimizations Core Engine Improvements Production SerDe Requirements
  • 28. • LazyBinarySerDe (Apache Hive) Not zero-copy nor word-aligned, require converters in Spark • Protocol Buffers / Thrift Not zero-copy, more suited for RPC • Flatbuffers / Cap’n Proto require converters (to/from InternalRow) in Spark Core • Apache Arrow great future option Binary SerDe Considerations
  • 29. UnsafeRow • Binary & Word-aligned • Zero-copy • Already part of Spark core • Available converters to/from InternalRow Binary SerDe Considerations Chosen format
  • 30. UnsafeRow SerDe SPARK-7076: Introduced UnsafeRow format to Spark apache/spark/sql/catalyst/expressions/UnsafeRow.java
  • 31. UnsafeRow SerDe SPARK-15962: Introduced UnsafeArrayData and UnsafeMapData apache/spark/sql/catalyst/expressions/UnsafeArrayData.java apache/spark/sql/catalyst/expressions/UnsafeMapData.java
  • 32. UnsafeRow SerDe UnsafeRow SerDe C++ library INT BIGINT BOOLEAN FLOAT DOUBLE STRING ARRAY<INT> MAP<INT,STRING> int32_t int64_t bool float double unsaferow::String unsaferow::List<int32_t> unsaferow::Map<int32_t, unsaferow::String> SQL datatypes C++ datatypes
  • 33. UnsafeRow SerDe UnsafeRow SerDe C++ library SELECT TRANSFORM (id INT) ROW FORMAT SERDE 'UnsafeRowSerDe' USING ‘script’ AS (value BIGINT) ROW FORMAT SERDE UnsafeRowSerDe' FROM src_tbl; #include ”spark/Transformer.h” while (transformer.readRow(input)) { // data processing int32_t id = input->getID(); output->setValue(id*id); // write output transformer.writeRow(output) } SQL Query C++ Transformer
  • 34. Core Engine Improvements SerDe support summary Development Production • Text-based DelimitedJSONSerDe.scala SimpleSerDe.scala • Binary UnsafeRowSerDe.scala
  • 35. Core Engine Improvements SELECT TRANSFORM (id, AVG(value) AS value_avg) USING ‘script’ AS (output) FROM src_tbl; GROUP BY id; Aggregation and projection support (SQL)
  • 36. Agenda 1. Intro to Spark Script Transforms 2. Spark Transforms at Facebook 3. Core Engine Improvements 4. Efficiency Analysis and Results 5. Transforms Execution Model 6. Future Plans
  • 37. • Text-based (UTF-8) - JSON - Row Format Delimited • Binary: - UnsafeRow Efficiency Analysis SerDe overhead JSON lib
  • 39. Efficiency Analysis Text-SerDe CPU overhead: Transform process
  • 40. • Text-based SerDe overhead is non-negligible especially for Complex types • SerDe cost could be up to 70% of pipeline’s CPU resources Efficiency Analysis SerDe overhead
  • 41. • Text-based SerDe overhead is non-negligible especially for Complex types • SerDe cost could be up to 70% of pipeline’s CPU resources Solution: use an efficient binary SerDe Efficiency Analysis SerDe overhead
  • 42. Efficiency Analysis: UnsafeRow Efficient Binary SerDe UnsafeRow C++ lib
  • 45. UnsafeRow SerDe Benchmark Text-Based SerDe vs Binary SerDe (UnsafeRow) Transform pipelines end- to-end CPU savings: up to 4x Complex types SerDe impacted the most
  • 46. Agenda 1. Intro to Spark Script Transforms 2. Spark Transforms at Facebook 3. Core Engine Improvements 4. Efficiency Analysis and Results 5. Transforms Execution Model 6. Future Plans
  • 47. CPU cores per container: spark.executor.cores = 4 Memory per container: spark.executor.memory=4GB + spark.transform.memory=4GB Transforms Execution Model Resource Request Spark Driver Cluster Manager Node Manager Node Manager Executor Task 1 Task 2 Resource Request CPU cores = 4, Memory = 8GB Launch Spark Executor Process 1 Process 2 Executor
  • 48. • JVM’s memory limits: Xms, Xmx and Xss. • CPU threads: spark.executor.core,spark.task.cpus Transforms Execution Model Resource Control
  • 49. • JVM’s memory limits: Xms, Xmx and Xss. • CPU threads: spark.executor.core,spark.task.cpus These limits are irrelevant when running an external process! Transforms Execution Model Resource Control
  • 50. • JVM’s memory limits: Xms, Xmx and Xss. • CPU threads: spark.executor.core,spark.task.cpus These limits are irrelevant when running an external process! Solution: cgroup v2 containers Transforms Execution Model Resource Control
  • 51. cgroup v2 controllers: • cpu.weight Allows Multi-threaded transforms • memory.max OOM offending processes • io.latency IO QoS Transforms Execution Model Resource Control & Isolation
  • 52. Transforms Execution Model Resource Control & Isolation /cgroup2/task_container/exec1
  • 53. Agenda 1. Intro to Spark Script Transforms 2. Spark Transforms at Facebook 3. Core Engine Improvements 4. Efficiency Analysis and Results 5. Transforms Execution Model 6. Future Plans
  • 54. • Binary SerDe based on Apache arrow • Vectorization Future Plans