SlideShare a Scribd company logo
*
Apollo
James Burkhart
Uber - Staff Engineer
Agenda
- Motivation
- Ingest
- Storage
- Query
Motivation
- Business Intelligence
- Real-time
- Time series aggregates
- Geospatial
What is Apollo?
- Real-time analytics platform focused on:
- Recent data (~7 weeks)
- Immediate visibility (1500ms-3minute p99 ingest latency)
- Ad-hoc queryability
- Arbitrary drilldown
- Geospatial functionality
- Data correctness/deduplication (exactly-once)
- Extremely low latency query (<100ms p95, <1s p99)
- Powering internal data tools at Uber
Real-time operational analytics dashboarding
- Used by majority of
Operations weekly
Apollo Query Builder
- Web UI for Apollo
Query Language
- Fully interactive
NYE 2016-2017
Motivation, Functionality Requirements
- Index based on data timestamp, not arrival timestamp
- Out of order and late (up to days later) arrival
- Mutability
- Sub-linear performance impact of scaling QPS
Apollo architecture
Users
Environment Management
(MemSQL Cluster Sizes)
Datacenter 1 Datacenter 2
Production Prime
33x 256GB
Production Prime 2
43x 256GB
Production Minor
5x 256GB
Production Minor 2
7x 256GB
Staging/Preprod
25x 256GB
mirrored
Ingestion
Ingestion
● Simple transformations
○ (i.e string uuid to binary representation)
■ “123e4567-e89b-12d3-a456-426655440000” >= 36B
■ 0x123E4567E89B12D3A456426655440000 >= 16B
● Filters
● Each job is one input stream to (>=1) output tables
● Independent job instance per environment
val inputStream = KafkaInputStream(topic);
job.outputTables.forEach((outputTable) => {
inputStream
.filter( ... )
.map(..transformations -> sql row...)
.grouped(outputTable.batchSize)
.forEach(writeBatchToDatabase)
});
Ingestion
● Upserts - No double counting!
● Async RF=2 MemSQL replication
○ Can lose recent writes during hardware failure
● Solution -> every 6 hours, upsert last 72h worth of data in
batch from Hive
Storage
● In-memory rowstore - mutable/recent
● Columnstore - immutable/older
Caching
● Partial, recomposable results
● Sharded MySQLs
Apollo Query Language (AQL)
● Custom Analytical Time-Series Query Language
● Goals:
○ Flexibility like SQL
○ Minimal Learning Curve
○ Ease-of-Use
● Features:
○ Canonicalization
○ Ease-of-parsing
○ Error detection
○ Automatic optimization
{
"table": "trips",
"joins": [
{
"alias": "g",
"table": "geofences",
"conditions": [
"geography_intersects(request_at, g.shape)"
]
}
], "dimensions": [
{
"sqlExpression": "request_at",
"timeBucketizer": "day",
"timeUnit": "millisecond"
}
], "measures": [
{
"sqlExpression": "count(*)",
"rowFilters": [
"status='completed'"
]
}
], "rowFilters": [
"city_id=1",
"g.uuid=0x0A"
], "timeFilter": {
"column": "request_at",
"from": "yesterday",
"to": "yesterday"
},
"timezone": "America/Los_Angeles"
}
Example
Apollo Query Builder
- Web UI for Apollo
Query Language
- Fully interactive
Why SQL is hard for time series OLAP
Field Value
Dimension.SQLExpression request_at
Dimension.TimeBucketizer day
Dimension.TimeUnit millisecond
Timezone America/Los_Angeles
Why SQL is hard for time series OLAP
● Date/time functions:
○ ROUND(UNIX_TIMESTAMP(CONVERT_TZ(DATE_FORMAT(CONVERT_TZ(FROM_UNIXTIME(((trips.request_at) - (trips.request_at) %
900000) / 1000), 'GMT', 'America/Los_Angeles'), '%Y-%m-%d'), 'America/Los_Angeles', 'UTC')) / 0.001, 0)
○ Cheap timestamp snapping to 15m
○ Conversion from milliseconds to seconds
○ Conversion from Unix timestamp to SQL time
○ Adding timezone to Unix time
○ Date/time formatting/truncation
○ Timezone conversion
○ Conversion from SQL time to Unix timestamp
○ Conversion from seconds to milliseconds
Field Value
Dimension.SQLExpression request_at
Dimension.TimeBucketizer day
Dimension.TimeUnit millisecond
Timezone America/Los_Angeles
Why SQL is hard for time series OLAP
● City/Region/Country based timezone
○ ROUND(UNIX_TIMESTAMP(CONVERT_TZ(DATE_FORMAT(CONVERT_TZ(FROM_UNIXTIME(((trips.request_at) - (trips.request_at) %
900000) / 1000), 'GMT', __tz__.sub_region_timezone), '%Y-%m-%d'), __tz__.sub_region_timezone, 'UTC')) / 0.001, 0) FROM trips
JOIN api_cities as __tz__ ON trips.city_id = __tz__.id
○ Join with api_cities (which has timezone info of each level) on city_id
○ Use the corresponding timezone column from api_cities
Field Value
Dimension.SQLExpression request_at
Dimension.TimeBucketizer day
Dimension.TimeUnit millisecond
Timezone sub_region_timezone(city_id)
Why SQL is hard for time series OLAP
● #completed_trips / #requested_trips
○ SUM(CASE WHEN trips.status=’completed’ THEN 1 ELSE 0 END) / SUM(CASE WHEN trips.status!=’ignored’ THEN 1 ELSE 0 END)
○ SELECT …, _1.completed / _2.requested FROM (SELECT …, COUNT(*) as completed FROM trips WHERE status=’completed’ GROUP BY
...) AS _1 JOIN (SELECT …, COUNT(*) as requested FROM trips WHERE status!=’ignored’ GROUP BY ...) AS _2 ON ...
○ Filters make measures complex
Field Value
Measure[0].SQLExpression count(*)
Measure[0].Filters status=’completed’
Measure[0].Alias completed
Measure[1].SQLExpression count(*)
Measure[1].Filters status!=’ignored’
Measure[1].Alias requested
Measure[2].SQLExpression completed / requested
Why SQL is hard for time series OLAP
● #Trips by geofence for geofence A, B and C
○ SELECT count(*), geofences.uuid FROM trips JOIN geofences ON geography_intersects(trips.request_point, geofences.shape) WHERE
geofences.uuid IN (A, B, C) GROUP By geofences.uuid
● Total #Trips for geofence A, B and C
○ SELECT count(*) FROM trips JOIN geofences ON geography_intersects(trips.request_point, geofences.shape) WHERE geofences.uuid IN
(A, B, C)
● Overlapping is OK, overcounting is not!
○ SELECT count(*) FROM trips WHERE EXISTS (SELECT * FROM geofences WHERE geography_intersects(trips.request_point,
geofences.shape) AND geofences.uuid IN (A, B, C)
Bad SQL queries
● SELECT count(*), request_at FROM trips GROUP BY request_at;
○ Time needs to be bucketized! Grouping by milliseconds makes no sense!
● SELECT count(*), fare_total FROM trips GROUP BY fare_total;
○ Some numeric values such as fare needs to be bucketized (reported as histograms)!
● SELECT sum(fare_total) FROM trips, other_table WHERE trips.fare_total>1.0 AND other_table.foo=’BAR’;
○ Join condition is missing, cartesian product is bad!
AQL Query Optimization
Date/time function performance issue
● CONCAT(DATE_FORMAT(FROM_UNIXTIME((__d0__) / 1000), '%Y-%m-%d '), LPAD(3 *
FLOOR(HOUR(FROM_UNIXTIME((__d0__) / 1000)) / 3), 2, '0'), ':00')
● Run for every row (trip)!
Two-stage aggregation
date/time
function
bucketizaton
request_at
count(*)
date/time
function
bucketizaton
request_at
count(*) as c
t - t % 15m
sum(c) Stage 2
Stage 1
Time Series Bucket Splitting
Now: 2016-03-22 13:17
2016-03-21 (partial week)
2016-03-21 (day) 2016-03-22
00:00
(hour)
2016-03-22
01:00
(hour)
...
(hour)
2016-03-22
12:00
(hour)
2016-03-22
13:00
(15m)
2016-03-22
13:15
(minute)
2016-03-22
13:16
(minute)
2016-03-22 13:15 (15m)
Split Rollup
From: this week To: now
Time Series Bucket Splitting
2016-03-07 (week)
To: -12h
2016-03-14 (week) 2016-03-21
(partial week)
2016-03-02
(partial week)
From: -20d
2016-03-02
(day)
2016-03-03
(day)
... (day) 2016-03-06
(day)
2016-03-21
(day)
2016-03-22
00:00 (hour)
Now: 2016-03-22 13:17
2016-03-22
01:00 (hour)
Split Rollup Split Rollup
BucketSize: week
AQL Query Optimization
Aggregate rollups
avg(x) = sum(x) / count(*)
Original function Stage 1 Stage 2 (rollup)
count count sum
sum sum sum
min min min
max max max
count distinct distinct count distinct
HyperLogLog
Contracts
SELECT AVG(fare), ts_15m FROM trips WHERE time >= (now() - 1h)
(where city=x)
group by 15m(, city);
Contracts
SELECT AVG(fare), ts_15m FROM trips WHERE time >= (now() - 1h)
(where city=x)
group by 15m(, city);
(where city=x) --p95--> 50ms 60ms 70ms
For x in cities:
(where city=x) -sum-> ~9s ~10s ~12s
group by city --p95--> 200ms ~1s ~7s
1h 24h (21d, group by 24h)
Contracts
SELECT AVG(fare), ts_15m FROM trips WHERE time >= (now() - 1h)
(where city=x)
group by 15m(, city);
(where city=x) --p95--> 50ms 60ms 70ms
For x in cities:
(where city=x) -sum-> ~9s ~10s ~12s
group by city --p95--> 200ms ~1s ~7s
1h 24h (21d, group by 24h)
Contracts
SELECT COUNT(1), AVG(fare), SUM(fare), AVG(eta) FROM trips WHERE ...
SELECT COUNT(1), AVG(fare), SUM(fare), SUM(eta) FROM trips WHERE ...
Contracts
SELECT COUNT(1) FROM trips WHERE
City = ‘San Francisco’
State = ’completed’
Product = ’Uber-X’
(City,State,Product),(City,State),(City,Product),(City),
(State),(State,Product),
(Product),
(∅)
Geographical Breakdowns:
World > North America > United States > US West > California > BayArea > SF
Contracts
SELECT COUNT(1) FROM trips WHERE GROUP BY
City = ‘San Francisco’
State = ’completed’
Product = ’Uber-X’
(City,State,Product),(City,State),(City,Product),(City),
(State),(State,Product),
(Product),
(∅)
Geographical Breakdowns:
World > North America > United States > US West > California > BayArea > SF
Stats
● p80 <= 10ms
● p90 <= 50ms
● p95 <= 100ms
● p99 <= 1000ms
● p99.5 <= 5000ms
● Millions queries/day
● ~250k distinct queries
● Billions MySQL writes/day
Future Plans (next 3-6 months)
● Product
○ Self-service onboarding and schema management
○ Schema change management and automation
● Technology
○ Cost Accounting
○ Contract automation
○ Query cost estimation
Challenges and Learnings
Schema Challenges
● Many Schemas:
○ Ingestion transformations
■ Hive
■ Avro-encoded Kafka
○ MemSQL Schema
○ Query layer schema
Ingestion
Ingestion
Metric Spark Golang
Containers 32 4
CPU Cores 160 8
Memory (GB) 226 16
Throughput 36k/s 60k/s
Performance differences for largest job
Questions?
(PS: We’re hiring)
Uber Engineering Blog
eng.uber.com
Uber Open Source
uber.github.io
Uber Eng Twitter
twitter.com/ubereng
These slides
https://tinyurl.com/apollostrata msql.co/uberscale
Check out ‘Hoodie: Incremental processing on Hadoop at Uber’ Thursday 1:50-2:30 for the
next Uber Strata presentation.

More Related Content

PDF
H2O - the optimized HTTP server
PDF
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
PPTX
Apache doris (incubating) introduction
PDF
Spark and S3 with Ryan Blue
PPTX
Kafka at Peak Performance
PPTX
Project Reactor By Example
PDF
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
PDF
Image Processing on Delta Lake
H2O - the optimized HTTP server
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Apache doris (incubating) introduction
Spark and S3 with Ryan Blue
Kafka at Peak Performance
Project Reactor By Example
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Image Processing on Delta Lake

What's hot (20)

PPTX
quick intro to elastic search
PDF
Deep Dive into the New Features of Apache Spark 3.0
PPTX
Dynamic filtering for presto join optimisation
PDF
The Secret Life of a Bug Bounty Hunter – Frans Rosén @ Security Fest 2016
PDF
Introduction to the Disruptor
PPTX
Hacked? Pray that the Attacker used PowerShell
PDF
[Golang] 以 Mobile App 工程師視角,帶你進入 Golang 的世界 (Introduction of GoLang)
PPTX
RedisConf17- Using Redis at scale @ Twitter
PDF
The innerHTML Apocalypse
PDF
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
PDF
Socket.IO
PPTX
Apache Arrow Flight Overview
PDF
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
PPTX
Minio scale 15 x
PDF
Virtual machine and javascript engine
KEY
Big Data in Real-Time at Twitter
PDF
Guaranteeing Memory Safety in Rust
PDF
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
PDF
History of Apache Pinot
quick intro to elastic search
Deep Dive into the New Features of Apache Spark 3.0
Dynamic filtering for presto join optimisation
The Secret Life of a Bug Bounty Hunter – Frans Rosén @ Security Fest 2016
Introduction to the Disruptor
Hacked? Pray that the Attacker used PowerShell
[Golang] 以 Mobile App 工程師視角,帶你進入 Golang 的世界 (Introduction of GoLang)
RedisConf17- Using Redis at scale @ Twitter
The innerHTML Apocalypse
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Socket.IO
Apache Arrow Flight Overview
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Minio scale 15 x
Virtual machine and javascript engine
Big Data in Real-Time at Twitter
Guaranteeing Memory Safety in Rust
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
History of Apache Pinot
Ad

Similar to Real-Time Analytics at Uber Scale (20)

PDF
Emerging Languages: A Tour of the Horizon
PDF
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
PPTX
Flink Forward Berlin 2018: Dawid Wysakowicz - "Detecting Patterns in Event St...
KEY
R meets Hadoop
PPTX
Python en la Plataforma ArcGIS
ODP
Introduction To PostGIS
PDF
Flux and InfluxDB 2.0
PPTX
Spark - Citi Bike NYC
PDF
Router Queue Simulation in C++ in MMNN and MM1 conditions
PPTX
Prediction of taxi rides ETA
PPTX
CPP Homework Help
PDF
Quill - 一個 Scala 的資料庫存取利器
PDF
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
PPTX
Grill at bigdata-cloud conf
PDF
Prob-Dist-Toll-Forecast-Uncertainty
PPTX
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)
PPT
Schema Design by Chad Tindel, Solution Architect, 10gen
PDF
Trees And More With Postgre S Q L
PDF
ClickHouse Materialized Views: The Magic Continues
KEY
Parallel Computing in R
Emerging Languages: A Tour of the Horizon
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
Flink Forward Berlin 2018: Dawid Wysakowicz - "Detecting Patterns in Event St...
R meets Hadoop
Python en la Plataforma ArcGIS
Introduction To PostGIS
Flux and InfluxDB 2.0
Spark - Citi Bike NYC
Router Queue Simulation in C++ in MMNN and MM1 conditions
Prediction of taxi rides ETA
CPP Homework Help
Quill - 一個 Scala 的資料庫存取利器
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Grill at bigdata-cloud conf
Prob-Dist-Toll-Forecast-Uncertainty
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)
Schema Design by Chad Tindel, Solution Architect, 10gen
Trees And More With Postgre S Q L
ClickHouse Materialized Views: The Magic Continues
Parallel Computing in R
Ad

More from SingleStore (20)

PPTX
Five ways database modernization simplifies your data life
PPTX
How Kafka and Modern Databases Benefit Apps and Analytics
PDF
Architecting Data in the AWS Ecosystem
PPTX
Building the Foundation for a Latency-Free Life
PDF
Converging Database Transactions and Analytics
PDF
Building a Machine Learning Recommendation Engine in SQL
PPTX
MemSQL 201: Advanced Tips and Tricks Webcast
PDF
Introduction to MemSQL
PDF
An Engineering Approach to Database Evaluations
PPTX
Building a Fault Tolerant Distributed Architecture
PDF
Stream Processing with Pipelines and Stored Procedures
PPTX
Curriculum Associates Strata NYC 2017
PPTX
Image Recognition on Streaming Data
PPTX
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
PDF
The State of the Data Warehouse in 2017 and Beyond
PDF
How Database Convergence Impacts the Coming Decades of Data Management
PPTX
Teaching Databases to Learn in the World of AI
PDF
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
PPTX
Gartner Catalyst 2017: Image Recognition on Streaming Data
PPTX
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Five ways database modernization simplifies your data life
How Kafka and Modern Databases Benefit Apps and Analytics
Architecting Data in the AWS Ecosystem
Building the Foundation for a Latency-Free Life
Converging Database Transactions and Analytics
Building a Machine Learning Recommendation Engine in SQL
MemSQL 201: Advanced Tips and Tricks Webcast
Introduction to MemSQL
An Engineering Approach to Database Evaluations
Building a Fault Tolerant Distributed Architecture
Stream Processing with Pipelines and Stored Procedures
Curriculum Associates Strata NYC 2017
Image Recognition on Streaming Data
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
The State of the Data Warehouse in 2017 and Beyond
How Database Convergence Impacts the Coming Decades of Data Management
Teaching Databases to Learn in the World of AI
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: Image Recognition on Streaming Data
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark

Recently uploaded (20)

PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Review of recent advances in non-invasive hemoglobin estimation
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Machine learning based COVID-19 study performance prediction
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
cuic standard and advanced reporting.pdf
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
Electronic commerce courselecture one. Pdf
PDF
KodekX | Application Modernization Development
PPT
Teaching material agriculture food technology
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Review of recent advances in non-invasive hemoglobin estimation
The AUB Centre for AI in Media Proposal.docx
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
NewMind AI Weekly Chronicles - August'25 Week I
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Network Security Unit 5.pdf for BCA BBA.
Machine learning based COVID-19 study performance prediction
“AI and Expert System Decision Support & Business Intelligence Systems”
Reach Out and Touch Someone: Haptics and Empathic Computing
cuic standard and advanced reporting.pdf
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
Electronic commerce courselecture one. Pdf
KodekX | Application Modernization Development
Teaching material agriculture food technology
Understanding_Digital_Forensics_Presentation.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

Real-Time Analytics at Uber Scale

  • 3. Motivation - Business Intelligence - Real-time - Time series aggregates - Geospatial
  • 4. What is Apollo? - Real-time analytics platform focused on: - Recent data (~7 weeks) - Immediate visibility (1500ms-3minute p99 ingest latency) - Ad-hoc queryability - Arbitrary drilldown - Geospatial functionality - Data correctness/deduplication (exactly-once) - Extremely low latency query (<100ms p95, <1s p99) - Powering internal data tools at Uber
  • 5. Real-time operational analytics dashboarding - Used by majority of Operations weekly
  • 6. Apollo Query Builder - Web UI for Apollo Query Language - Fully interactive
  • 8. Motivation, Functionality Requirements - Index based on data timestamp, not arrival timestamp - Out of order and late (up to days later) arrival - Mutability - Sub-linear performance impact of scaling QPS
  • 10. Environment Management (MemSQL Cluster Sizes) Datacenter 1 Datacenter 2 Production Prime 33x 256GB Production Prime 2 43x 256GB Production Minor 5x 256GB Production Minor 2 7x 256GB Staging/Preprod 25x 256GB mirrored
  • 12. Ingestion ● Simple transformations ○ (i.e string uuid to binary representation) ■ “123e4567-e89b-12d3-a456-426655440000” >= 36B ■ 0x123E4567E89B12D3A456426655440000 >= 16B ● Filters ● Each job is one input stream to (>=1) output tables ● Independent job instance per environment
  • 13. val inputStream = KafkaInputStream(topic); job.outputTables.forEach((outputTable) => { inputStream .filter( ... ) .map(..transformations -> sql row...) .grouped(outputTable.batchSize) .forEach(writeBatchToDatabase) });
  • 14. Ingestion ● Upserts - No double counting! ● Async RF=2 MemSQL replication ○ Can lose recent writes during hardware failure ● Solution -> every 6 hours, upsert last 72h worth of data in batch from Hive
  • 15. Storage ● In-memory rowstore - mutable/recent ● Columnstore - immutable/older
  • 16. Caching ● Partial, recomposable results ● Sharded MySQLs
  • 17. Apollo Query Language (AQL) ● Custom Analytical Time-Series Query Language ● Goals: ○ Flexibility like SQL ○ Minimal Learning Curve ○ Ease-of-Use ● Features: ○ Canonicalization ○ Ease-of-parsing ○ Error detection ○ Automatic optimization
  • 18. { "table": "trips", "joins": [ { "alias": "g", "table": "geofences", "conditions": [ "geography_intersects(request_at, g.shape)" ] } ], "dimensions": [ { "sqlExpression": "request_at", "timeBucketizer": "day", "timeUnit": "millisecond" } ], "measures": [ { "sqlExpression": "count(*)", "rowFilters": [ "status='completed'" ] } ], "rowFilters": [ "city_id=1", "g.uuid=0x0A" ], "timeFilter": { "column": "request_at", "from": "yesterday", "to": "yesterday" }, "timezone": "America/Los_Angeles" } Example
  • 19. Apollo Query Builder - Web UI for Apollo Query Language - Fully interactive
  • 20. Why SQL is hard for time series OLAP Field Value Dimension.SQLExpression request_at Dimension.TimeBucketizer day Dimension.TimeUnit millisecond Timezone America/Los_Angeles
  • 21. Why SQL is hard for time series OLAP ● Date/time functions: ○ ROUND(UNIX_TIMESTAMP(CONVERT_TZ(DATE_FORMAT(CONVERT_TZ(FROM_UNIXTIME(((trips.request_at) - (trips.request_at) % 900000) / 1000), 'GMT', 'America/Los_Angeles'), '%Y-%m-%d'), 'America/Los_Angeles', 'UTC')) / 0.001, 0) ○ Cheap timestamp snapping to 15m ○ Conversion from milliseconds to seconds ○ Conversion from Unix timestamp to SQL time ○ Adding timezone to Unix time ○ Date/time formatting/truncation ○ Timezone conversion ○ Conversion from SQL time to Unix timestamp ○ Conversion from seconds to milliseconds Field Value Dimension.SQLExpression request_at Dimension.TimeBucketizer day Dimension.TimeUnit millisecond Timezone America/Los_Angeles
  • 22. Why SQL is hard for time series OLAP ● City/Region/Country based timezone ○ ROUND(UNIX_TIMESTAMP(CONVERT_TZ(DATE_FORMAT(CONVERT_TZ(FROM_UNIXTIME(((trips.request_at) - (trips.request_at) % 900000) / 1000), 'GMT', __tz__.sub_region_timezone), '%Y-%m-%d'), __tz__.sub_region_timezone, 'UTC')) / 0.001, 0) FROM trips JOIN api_cities as __tz__ ON trips.city_id = __tz__.id ○ Join with api_cities (which has timezone info of each level) on city_id ○ Use the corresponding timezone column from api_cities Field Value Dimension.SQLExpression request_at Dimension.TimeBucketizer day Dimension.TimeUnit millisecond Timezone sub_region_timezone(city_id)
  • 23. Why SQL is hard for time series OLAP ● #completed_trips / #requested_trips ○ SUM(CASE WHEN trips.status=’completed’ THEN 1 ELSE 0 END) / SUM(CASE WHEN trips.status!=’ignored’ THEN 1 ELSE 0 END) ○ SELECT …, _1.completed / _2.requested FROM (SELECT …, COUNT(*) as completed FROM trips WHERE status=’completed’ GROUP BY ...) AS _1 JOIN (SELECT …, COUNT(*) as requested FROM trips WHERE status!=’ignored’ GROUP BY ...) AS _2 ON ... ○ Filters make measures complex Field Value Measure[0].SQLExpression count(*) Measure[0].Filters status=’completed’ Measure[0].Alias completed Measure[1].SQLExpression count(*) Measure[1].Filters status!=’ignored’ Measure[1].Alias requested Measure[2].SQLExpression completed / requested
  • 24. Why SQL is hard for time series OLAP ● #Trips by geofence for geofence A, B and C ○ SELECT count(*), geofences.uuid FROM trips JOIN geofences ON geography_intersects(trips.request_point, geofences.shape) WHERE geofences.uuid IN (A, B, C) GROUP By geofences.uuid ● Total #Trips for geofence A, B and C ○ SELECT count(*) FROM trips JOIN geofences ON geography_intersects(trips.request_point, geofences.shape) WHERE geofences.uuid IN (A, B, C) ● Overlapping is OK, overcounting is not! ○ SELECT count(*) FROM trips WHERE EXISTS (SELECT * FROM geofences WHERE geography_intersects(trips.request_point, geofences.shape) AND geofences.uuid IN (A, B, C)
  • 25. Bad SQL queries ● SELECT count(*), request_at FROM trips GROUP BY request_at; ○ Time needs to be bucketized! Grouping by milliseconds makes no sense! ● SELECT count(*), fare_total FROM trips GROUP BY fare_total; ○ Some numeric values such as fare needs to be bucketized (reported as histograms)! ● SELECT sum(fare_total) FROM trips, other_table WHERE trips.fare_total>1.0 AND other_table.foo=’BAR’; ○ Join condition is missing, cartesian product is bad!
  • 26. AQL Query Optimization Date/time function performance issue ● CONCAT(DATE_FORMAT(FROM_UNIXTIME((__d0__) / 1000), '%Y-%m-%d '), LPAD(3 * FLOOR(HOUR(FROM_UNIXTIME((__d0__) / 1000)) / 3), 2, '0'), ':00') ● Run for every row (trip)! Two-stage aggregation date/time function bucketizaton request_at count(*) date/time function bucketizaton request_at count(*) as c t - t % 15m sum(c) Stage 2 Stage 1
  • 27. Time Series Bucket Splitting Now: 2016-03-22 13:17 2016-03-21 (partial week) 2016-03-21 (day) 2016-03-22 00:00 (hour) 2016-03-22 01:00 (hour) ... (hour) 2016-03-22 12:00 (hour) 2016-03-22 13:00 (15m) 2016-03-22 13:15 (minute) 2016-03-22 13:16 (minute) 2016-03-22 13:15 (15m) Split Rollup From: this week To: now
  • 28. Time Series Bucket Splitting 2016-03-07 (week) To: -12h 2016-03-14 (week) 2016-03-21 (partial week) 2016-03-02 (partial week) From: -20d 2016-03-02 (day) 2016-03-03 (day) ... (day) 2016-03-06 (day) 2016-03-21 (day) 2016-03-22 00:00 (hour) Now: 2016-03-22 13:17 2016-03-22 01:00 (hour) Split Rollup Split Rollup BucketSize: week
  • 29. AQL Query Optimization Aggregate rollups avg(x) = sum(x) / count(*) Original function Stage 1 Stage 2 (rollup) count count sum sum sum sum min min min max max max count distinct distinct count distinct HyperLogLog
  • 30. Contracts SELECT AVG(fare), ts_15m FROM trips WHERE time >= (now() - 1h) (where city=x) group by 15m(, city);
  • 31. Contracts SELECT AVG(fare), ts_15m FROM trips WHERE time >= (now() - 1h) (where city=x) group by 15m(, city); (where city=x) --p95--> 50ms 60ms 70ms For x in cities: (where city=x) -sum-> ~9s ~10s ~12s group by city --p95--> 200ms ~1s ~7s 1h 24h (21d, group by 24h)
  • 32. Contracts SELECT AVG(fare), ts_15m FROM trips WHERE time >= (now() - 1h) (where city=x) group by 15m(, city); (where city=x) --p95--> 50ms 60ms 70ms For x in cities: (where city=x) -sum-> ~9s ~10s ~12s group by city --p95--> 200ms ~1s ~7s 1h 24h (21d, group by 24h)
  • 33. Contracts SELECT COUNT(1), AVG(fare), SUM(fare), AVG(eta) FROM trips WHERE ... SELECT COUNT(1), AVG(fare), SUM(fare), SUM(eta) FROM trips WHERE ...
  • 34. Contracts SELECT COUNT(1) FROM trips WHERE City = ‘San Francisco’ State = ’completed’ Product = ’Uber-X’ (City,State,Product),(City,State),(City,Product),(City), (State),(State,Product), (Product), (∅) Geographical Breakdowns: World > North America > United States > US West > California > BayArea > SF
  • 35. Contracts SELECT COUNT(1) FROM trips WHERE GROUP BY City = ‘San Francisco’ State = ’completed’ Product = ’Uber-X’ (City,State,Product),(City,State),(City,Product),(City), (State),(State,Product), (Product), (∅) Geographical Breakdowns: World > North America > United States > US West > California > BayArea > SF
  • 36. Stats ● p80 <= 10ms ● p90 <= 50ms ● p95 <= 100ms ● p99 <= 1000ms ● p99.5 <= 5000ms ● Millions queries/day ● ~250k distinct queries ● Billions MySQL writes/day
  • 37. Future Plans (next 3-6 months) ● Product ○ Self-service onboarding and schema management ○ Schema change management and automation ● Technology ○ Cost Accounting ○ Contract automation ○ Query cost estimation
  • 39. Schema Challenges ● Many Schemas: ○ Ingestion transformations ■ Hive ■ Avro-encoded Kafka ○ MemSQL Schema ○ Query layer schema
  • 41. Ingestion Metric Spark Golang Containers 32 4 CPU Cores 160 8 Memory (GB) 226 16 Throughput 36k/s 60k/s Performance differences for largest job
  • 42. Questions? (PS: We’re hiring) Uber Engineering Blog eng.uber.com Uber Open Source uber.github.io Uber Eng Twitter twitter.com/ubereng These slides https://tinyurl.com/apollostrata msql.co/uberscale Check out ‘Hoodie: Incremental processing on Hadoop at Uber’ Thursday 1:50-2:30 for the next Uber Strata presentation.