Powering Custom Apps at Facebook using Spark Script Transformation

WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics

Powering Custom Apps at Facebook
using Spark Script Transformation
Abdulrahman Alfozan
Spark Summit Europe

Agenda
1. Intro to Spark Script Transforms
2. Spark Transforms at Facebook
3. Core Engine Improvements
4. Efficiency Analysis and Results
5. Transforms Execution Model
6. Future Plans

2015
Small Scale
Experiments
2016
Few Pipelines in
Production
2017
Running 60TB+
shuffle pipelines
2018
Full-production
deployment
Successor to Apache
Hive at Facebook
2019
Scaling Spark
Largest Compute
Engine at Facebook
by CPU
Spark at Facebook
Reliability and efficiency are our top priority

Script Transforms
SQL query
SELECT
TRANSFORM (inputs)
USING “script”
AS (outputs)
FROM src_tbl;

Script Transforms
SQL query
ScriptTransformation (inputs,
script, outputs)
TableScan (src_tbl)
SELECT
TRANSFORM (inputs)
USING “script”
AS (outputs)
FROM src_tbl;
Query plan

Script Transforms
ScriptTransformation (inputs,
script, outputs)
TableScan (src_tbl)
SQL query Query plan
Spark External
Process
Input Table
Output Table
inputs
outputs
Execution
SELECT
TRANSFORM (inputs)
USING “script”
AS (outputs)
FROM src_tbl;

1. Flexibility:
Unlike UDFs, transforms allow unlimited use-cases
2. Efficiency:
Most transformers are written in C++
Why Script Transforms?

1. Flexibility:
2. Efficiency:
Transforms provide custom data processing while relying on Spark for
ETL, data partitioning, distributed execution, and fault-tolerance.

1. Flexibility:
2. Efficiency:
Transforms provide custom data processing while relying on Spark for
ETL, data partitioning, distributed execution, and fault-tolerance.
e.g. Spark is optimized for ETL. PyTorch is optimized for model serving.

Agenda
1. Intro to Spark Script Transforms
2. Spark Transforms at Facebook
3. Core Engine Improvements
4. Efficiency Analysis and Results
5. Transforms Execution Model
6. Future plans

Transform Pipelines Usage
% of overall CPU
15%
12%
9%
6%
3%
0%

Pure SQL (54%)
Pure SQL (72%)
Transforms & UDFs
(45%)
Transforms & UDFs
(20%)
DataFrames (1%)
DataFrames (8%)
Count CPU
Transform Pipelines Usage
Query Count CPU
Comparison

Use-case 1: Batch Inference
SQL Query
Transform resourcesADD FILES inference_engine, model.md;
SELECT
TRANSFORM (id INT, metadata STRING, image STRING)
ROW FORMAT SERDE 'JSONSimpleSerDe'
USING ‘inference_engine --model=model.md’
AS labels MAP<STRING, DOUBLE>
ROW FORMAT SERDE 'JSONSimpleSerDe'
FROM tlb_images;
Output: category>confidence
Input columns
Input format
Output format

Transform main.cpp
#include ”spark/Transformer.h”
...
while (transformer.readRow(input)) {
// data processing
auto prediction = predict(input)
// write output map
transformer.writeRow(prediction)
}
Transform lib
Row iterator

PyTorch runtime container
Self-contained Executable
Spark Executor
Transform Process
stdin
stdout
Spark Task
InternalRow
Serialization into JSON
JSON deserialization
into InternalRow
JSON deserialization into
C++ objects
C++ objects
serialization into JSON
{id:1, metadata:, image:…}
{label_1: score, label_2: score}
Model

Use-case 2: Batch Indexing
SQL Query
Transform resourcesADD FILES indexer;
SELECT
TRANSFORM (shard_id INT, data STRING)
ROW FORMAT SERDE ‘RowFormatDelimited‘
USING ‘indexer --schema=data<STRING>’
FROM src_tbl
CLUSTER BY shard_id;
Input columns
Input format
Partition operator

Spark Task
Spark Task
Use-case 2: Batch Indexing
shard_id data
1 {…}
1 {…}
2 {…}
shard_id data
1 {…}
2 {…}
2 {…}
shard_id data
1 {…}
1 {…}
1 {…}
shard_id data
2 {…}
2 {…}
2 {…}
Mappers
Reducer Transforms
Shuffle Reducer 1
Mapper 1
Mapper 2
Transform Process
stdin
indexer
…
Reducer 2
Execution

ScriptTransformationExec.scala
• Direct process invocation
• Class IOSchema to handle SerDe schema and config
• MonitorThread to track transform process progress
• Transform process error handling and surfacing
Core Engine Improvements
Operator

• DelimitedJSONSerDe.scala
JSON format standard RFC 8259
SerDe support

• SimpleSerDe.scala
ROW FORMAT DELIMITED
SerDe support
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY ':'
LINES TERMINATED BY 'n'
Configurable properties

SerDe support
Development
• Text-based
DelimitedJSONSerDe.scala
SimpleSerDe.scala
• Binary
?
Production

• Binary format
Text-based encoding is slow and less-compact
Production SerDe Requirements

• Binary format
• Zero-copy
Access to serialized data without parsing or unpacking
Improving Facebook’s performance on Android with FlatBuffers

• Binary format
• Zero-copy
Access to serialized data without parsing or unpacking
Improving Facebook’s performance on Android with FlatBuffers
• Word-aligned data
Allow for SIMD optimizations

• LazyBinarySerDe (Apache Hive)
Not zero-copy nor word-aligned, require converters in Spark
• Protocol Buffers / Thrift
Not zero-copy, more suited for RPC
• Flatbuffers / Cap’n Proto
require converters (to/from InternalRow) in Spark Core
• Apache Arrow
great future option
Binary SerDe Considerations

UnsafeRow
• Binary & Word-aligned
• Zero-copy
• Already part of Spark core
• Available converters to/from InternalRow
Binary SerDe Considerations
Chosen format

UnsafeRow SerDe
SPARK-7076: Introduced UnsafeRow format to Spark
apache/spark/sql/catalyst/expressions/UnsafeRow.java

UnsafeRow SerDe
SPARK-15962: Introduced UnsafeArrayData and UnsafeMapData
apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
apache/spark/sql/catalyst/expressions/UnsafeMapData.java

UnsafeRow SerDe
UnsafeRow SerDe C++ library
INT
BIGINT
BOOLEAN
FLOAT
DOUBLE
STRING
ARRAY<INT>
MAP<INT,STRING>
int32_t
int64_t
bool
float
double
unsaferow::String
unsaferow::List<int32_t>
unsaferow::Map<int32_t, unsaferow::String>
SQL datatypes C++ datatypes

UnsafeRow SerDe
UnsafeRow SerDe C++ library
SELECT
TRANSFORM (id INT)
ROW FORMAT SERDE 'UnsafeRowSerDe'
USING ‘script’
AS (value BIGINT)
ROW FORMAT SERDE UnsafeRowSerDe'
FROM src_tbl;
#include ”spark/Transformer.h”
while (transformer.readRow(input)) {
// data processing
int32_t id = input->getID();
output->setValue(id*id);
// write output
transformer.writeRow(output)
}
SQL Query C++ Transformer

SerDe support summary
Development
Production
• Text-based
DelimitedJSONSerDe.scala
SimpleSerDe.scala
• Binary
UnsafeRowSerDe.scala

SELECT
TRANSFORM (id, AVG(value) AS value_avg)
USING ‘script’
AS (output)
FROM src_tbl;
GROUP BY id;
Aggregation and projection support (SQL)

• Text-based (UTF-8)
- JSON
- Row Format Delimited
• Binary:
- UnsafeRow
Efficiency Analysis
SerDe overhead
JSON lib

Efficiency Analysis
Text-SerDe CPU overhead: Spark

Efficiency Analysis
Text-SerDe CPU overhead: Transform process

• Text-based SerDe overhead is non-negligible
especially for Complex types
• SerDe cost could be up to 70% of pipeline’s CPU resources
Efficiency Analysis
SerDe overhead

• Text-based SerDe overhead is non-negligible
especially for Complex types
• SerDe cost could be up to 70% of pipeline’s CPU resources
Solution: use an efficient binary SerDe
Efficiency Analysis
SerDe overhead

Efficiency Analysis: UnsafeRow
Efficient Binary SerDe
UnsafeRow
C++ lib

Spark

Transform process

UnsafeRow SerDe Benchmark
Text-Based SerDe vs Binary
SerDe (UnsafeRow)
Transform pipelines end-
to-end CPU savings:
up to 4x
Complex types SerDe
impacted the most

CPU cores per container: spark.executor.cores = 4
Memory per container: spark.executor.memory=4GB + spark.transform.memory=4GB
Transforms Execution Model
Resource Request
Spark
Driver
Cluster
Manager
Node Manager
Node Manager
Executor
Task 1 Task 2
Resource Request
CPU cores = 4,
Memory = 8GB
Launch Spark
Executor
Process 1 Process 2
Executor

• JVM’s memory limits: Xms, Xmx and Xss.
• CPU threads:
spark.executor.core,spark.task.cpus
Resource Control

• CPU threads:
These limits are irrelevant when running an
external process!
Resource Control

• CPU threads:
These limits are irrelevant when running an
external process!
Solution: cgroup v2 containers
Resource Control

cgroup v2 controllers:
• cpu.weight
Allows Multi-threaded transforms
• memory.max
OOM offending processes
• io.latency
IO QoS
Resource Control & Isolation

Resource Control & Isolation
/cgroup2/task_container/exec1

• Binary SerDe based on Apache arrow
• Vectorization
Future Plans

Powering Custom Apps at Facebook using Spark Script Transformation

More Related Content

Similar to Powering Custom Apps at Facebook using Spark Script Transformation (20)

More from Databricks (20)

Recently uploaded (20)

Powering Custom Apps at Facebook using Spark Script Transformation