SlideShare a Scribd company logo
Impacts of Sharding, Partitioning, Encoding, and
Sorting on Distributed Query Performance
Nga Tran
Staff Engineer, InfluxData
July 14, 2021
● InfluxData - Staff Engineer
● Tableau/Salesforce (2 years)
○ Sr. Manager of Automatic Statistics
● Vertica RDBMS (over a decade)
○ Engineer of Query Optimizer
○ Director of Engineering (R&D)
● ELCA (4 years)
Outline
● Non-distributed vs Distributed Databases
● Splitting Data to Gain Query Performance
○ Sharding, Partitioning, Encoding, and Sorting
● Impacts of different data setups on Query Performance
Distributed Database
Non-Distributed DB: 1-node cluster
● 1 machine
● Data is loaded & then queried on that node
Distributed DB: Cluster of many nodes
● Several machines shares the work
● Data is horizontally split between nodes
● Data is queried from all nodes
Node
Non-Distributed DB
Node 1 Node 2 Node n
N nodes, each plays the same role and talks to each other
Distributed DB
Row 1
Row 2
……..
Row a
Row a+1
Row a+2
………..
Row b
Row x+1
Row x+2
………..
Row n
Distributed Database
Non-Distributed DB: 1-node cluster
● 1 machine
● Data is loaded & then queried on that node
Distributed DB: Cluster of many nodes
● Several machines shares the work
● Data is horizontally split between nodes
● Data is queried from all nodes
→ How to split data to gain query performance?
Node
Non-Distributed DB
Node 1 Node 2 Node n
N nodes, each plays the same role and talks to each other
Distributed DB
Row 1
Row 2
……..
Row a
Row a+1
Row a+2
………..
Row b
Row x+1
Row x+2
………..
Row n
Splitting Data to Gain Query Performance
● Sharding
○ Horizontally split a table into N non-overlapping shards
■ → each node will (equally) share 1/n of the workload:
● Load 1/n data to each node
● Query: join & group-by on each node share 1/n workload
● Partitioning
○ Each shard is further split into smaller partitions for better data filtering, deleting, fanning
out, local parallelism
● Encoding
○ Each column is encoded (sorted & compressed) to further help on join, filtering, group-by, order-by
Splitting Data to Gain Query Performance
● Sharding
○ Horizontally split a table into N non-overlapping shards
■ → each node will (equally) share 1/n of the workload:
● Load 1/n data to each node
● Query: join & group-by on each node share 1/n workload
● Partitioning
○ Each shard is further split into smaller partitions for better data filtering, deleting, fanning
out, local parallelism
● Encoding
○ Each column is encoded (sorted & compressed) to further help on join, filtering, group-by, order-by
→ Let us dig into examples
Line_Item
o_okey o_date o_pri
1 2021.05.01 2
2 2021.05.01 1
3 2021.05.02 1
4 2021.05.02 3
5 2021.05.02 1
Examples: Two tables Order & Line_Item
Order
l_okey l_name l_price l_shipdate
1 desk 100 2021.05.07
1 chair 50 2021.05.03
1 monitor 130 2021.05.03
1 mouse 10 2021.05.07
2 pot 20 2021.05.01
2 pan 25 2021.05.04
3 shirt 30 2021.05.10
4 bike 120 2021.05.04
4 helmet 30 2021.05.10
5 kayak 200 2021.05.05
5 lifevest 20 2021.05.02
Sharded : Order: (o_okey % 2) & Line_Item: (l_okey % 2)
o_okey o_date o_pri
1 2021.05.01 2
3 2021.05.01 1
5 2021.05.02 1
l_okey l_name l_price l_shipdate
1 desk 100 2021.05.07
1 chair 50 2021.05.03
1 monitor 130 2021.05.03
1 mouse 10 2021.05.07
3 shirt 30 2021.05.2
5 kayak 200 2021.05.07
5 lifevest 20 2021.05.02
o_okey o_date o_pri
2 2021.05.01 1
4 2021.05.02 3
l_okey l_name l_price l_shipdate
2 pot 20 2021.05.01
2 pan 25 2021.05.04
4 bike 120 2021.05.04
4 helmet 30 2021.05.10
Examples: 2-node cluster
Node 1 Node 2
Order Line_Item Line_Item
Order
Sharded : Order: (o_okey % 2) & Line_Item: (l_okey % 2)
Partitioned : Order: (o_date) & Line_Item: (l_shipdate)
o_okey o_date o_pri
1 2021.05.01 2
3 2021.05.01 1
5 2021.05.02 1
l_okey l_name l_price l_shipdate
3 shirt 30 2021.05.2
5 lifevest 20 2021.05.02
1 chair 50 2021.05.03
1 monitor 130 2021.05.03
1 desk 100 2021.05.07
1 mouse 10 2021.05.07
5 kayak 200 2021.05.07
o_okey o_date o_pri
2 2021.05.01 1
4 2021.05.02 3
l_okey l_name l_price l_shipdate
2 pot 20 2021.05.01
2 pan 25 2021.05.04
4 bike 120 2021.05.04
4 helmet 30 2021.05.10
Examples: 2-node cluster
Node 1 Node 2
Order Line_Item Line_Item
Order
Sharded : Order: (o_okey % 2) & Line_Item: (l_okey % 2)
Partitioned : Order: (o_date) & Line_Item: (l_shipdate)
Encoded & Sorted : Order: (o_okey) & Line_Item: RLE(l_okey)
o_okey o_date o_pri
1 2021.05.01 2
3 2021.05.01 1
5 2021.05.02 1
l_okey l_name l_price l_shipdate
(3,1) shirt 30 2021.05.2
(5,1) lifevest 20 2021.05.02
(1, 2) chair 50 2021.05.03
monitor 130 2021.05.03
(1,2) desk 100 2021.05.07
mouse 10 2021.05.07
(5,1) kayak 200 2021.05.07
o_okey o_date o_pri
2 2021.05.01 1
4 2021.05.02 3
l_okey l_name l_price l_shipdate
(2,1) pot 20 2021.05.01
(2,1) pan 25 2021.05.04
(4,1) bike 120 2021.05.04
(4,1) helmet 30 2021.05.10
Examples: 2-node cluster
Node 1 Node 2
Order Line_Item Line_Item
Order
Impacts of the setups on query performance
Examples: Query
SELECT
l_okey, sum(l_price) as revenue, o_date, o_pri
FROM
customer, orders, lineitem
WHERE
l_okey = o_key and o_date < 2021.05.02 and
l_shipdate > 2021.05.03
GROUP BY
l_okey, o_date,o_pri
ORDER BY
revenue desc, o_date;
Examples: Query - Do the shards help?
SELECT
l_okey, sum(l_price) as revenue, o_date, o_pri
FROM
customer, orders, lineitem
WHERE
l_okey = o_key and o_date < 2021.05.02 and
l_shipdate > 2021.05.03
GROUP BY
l_okey, o_date,o_pri
ORDER BY
revenue desc, o_date;
Sharded : Order: (o_okey % 2) & Line_Item: (l_okey % 2)
o_okey o_date o_pri
1 2021.05.01 2
3 2021.05.01 1
5 2021.05.02 1
l_okey l_name l_price l_shipdate
1 desk 100 2021.05.07
1 chair 50 2021.05.03
1 monitor 130 2021.05.03
1 mouse 10 2021.05.07
3 shirt 30 2021.05.2
5 kayak 200 2021.05.07
5 lifevest 20 2021.05.02
o_okey o_date o_pri
2 2021.05.01 1
4 2021.05.02 3
l_okey l_name l_price l_shipdate
2 pot 20 2021.05.01
2 pan 25 2021.05.04
4 bike 120 2021.05.04
4 helmet 30 2021.05.10
Back to Shard setup
Node 1 Node 2
Order Line_Item Line_Item
Order
Examples: Query - Do the shards help?
SELECT
l_okey, sum(l_price) as revenue, o_date, o_pri
FROM
customer, orders, lineitem
WHERE
l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03
GROUP BY
l_okey, o_date, o_pri
ORDER BY
revenue desc, o_date;
YES
● Join: l_okey = o_key
○ → all odd keys in node 1 and even keys in node 2
○ → Node 1 and node 2 join data on their local node. No need to shuffle data between nodes before
joining.
● Group By: l_okey, o_date, o_pri
○ → Similarly, same group-by keys are in the same nodes. Each node can aggregate data without the
need to reshuffle data
Examples: Query - Do the shards help?
SELECT
l_okey, sum(l_price) as revenue, o_date, o_pri
FROM
customer, orders, lineitem
WHERE
l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03
GROUP BY
l_key, o_date, o_pri
ORDER BY
revenue desc, o_date;
What if Order not sharded on o_okey & Line_item not sharded on l_okey?
Examples: Query - Do the shards help?
SELECT
l_okey, sum(l_price) as revenue, o_date, o_pri
FROM
customer, orders, lineitem
WHERE
l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03
GROUP BY
l_key, o_date, o_pri
ORDER BY
revenue desc, o_date;
What if Order not sharded on o_okey & Line_item not sharded on l_okey?
● Join: l_okey = o_key
○ → Need to reshuffle data so same join keys land on the same nodes before joining. Many ways:
■ Reshard on the fly both Order on o_okey and Line_Item on l_okey
■ Broadcast small table (o_okey) to other nodes
● Group By: l_okey, o_date, o_pri
○ → If after the join the data is shared on l_okey, nothing is needed. Otherwise, either:
■ Reshard data on l_okey to 2 nodes
■ Send everything to one node to do the final group-by
Examples: Query - Do the shards help?
SELECT
l_okey, sum(l_price) as revenue, o_date, o_pri
FROM
customer, orders, lineitem
WHERE
l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03
GROUP BY
l_key, o_date, o_pri
ORDER BY
revenue desc, o_date;
What if Order not sharded on o_okey & Line_item not sharded on l_okey?
● → Not sharded on join keys will lead to extra on-the-fly reshard or broadcast cost
● → Not already (re-)sharded on group-by keys before the group-by operator will lead to either
○ Reshard or
○ The final node has to do all the group-by work
Examples: Query - Do the partitions help?
SELECT
l_okey, sum(l_price) as revenue, o_date, o_pri
FROM
customer, orders, lineitem
WHERE
l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03
GROUP BY
l_okey, o_date,o_pri
ORDER BY
revenue desc, o_date;
Sharded : Order: (o_okey % 2) & Line_Item: (l_okey % 2)
Partitioned : Order: (o_date) & Line_Item: (l_shipdate)
o_okey o_date o_pri
1 2021.05.01 2
3 2021.05.01 1
5 2021.05.02 1
l_okey l_name l_price l_shipdate
3 shirt 30 2021.05.2
5 lifevest 20 2021.05.02
1 chair 50 2021.05.03
1 monitor 130 2021.05.03
1 desk 100 2021.05.07
1 mouse 10 2021.05.07
5 kayak 200 2021.05.07
o_okey o_date o_pri
2 2021.05.01 1
4 2021.05.02 3
l_okey l_name l_price l_shipdate
2 pot 20 2021.05.01
2 pan 25 2021.05.04
4 bike 120 2021.05.04
4 helmet 30 2021.05.10
Back to Partition Setup
Node 1 Node 2
Order Line_Item Line_Item
Order
Examples: Query - Do the partitions help?
SELECT
l_okey, sum(l_price) as revenue, o_date, o_pri
FROM
customer, orders, lineitem
WHERE
l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03
GROUP BY
l_okey, o_date,o_pri
ORDER BY
revenue desc, o_date;
Yes
● Filter: o_date < 2021.05.02 and l_shipdate > 2021.05.03
○ → Prune partitions not in the filter ranges
Examples: Query - Do the partitions help?
SELECT
l_okey, sum(l_price) as revenue, o_date, o_pri
FROM
customer, orders, lineitem
WHERE
l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03
GROUP BY
l_okey, o_date,o_pri
ORDER BY
revenue desc, o_date;
What if Order is not partitioned on o_date and Line_Item not partitioned on l_shipdate?
Examples: Query - Do the partitions help?
SELECT
l_okey, sum(l_price) as revenue, o_date, o_pri
FROM
customer, orders, lineitem
WHERE
l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03
GROUP BY
l_okey, o_date,o_pri
ORDER BY
revenue desc, o_date;
What if Order is not partitioned on o_date and Line_Item not partitioned on l_shipdate?
● → nothing to prune early, we have to scan all column data and apply the filter ranges
Examples: Query - Do the encoding & sorting help?
SELECT
l_okey, sum(l_price) as revenue, o_date, o_pri
FROM
customer, orders, lineitem
WHERE
l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03
GROUP BY
l_okey, o_date,o_pri
ORDER BY
revenue desc, o_date;
Sharded : Order: (o_okey % 2) & Line_Item: (l_okey % 2)
Partitioned : Order: (o_date) & Line_Item: (l_shipdate)
Encoded & Sorted : Order: (o_okey) & Line_Item: RLE(l_okey)
o_okey o_date o_pri
1 2021.05.01 2
3 2021.05.01 1
5 2021.05.02 1
l_okey l_name l_price l_shipdate
(3,1) shirt 30 2021.05.2
(5,1) lifevest 20 2021.05.02
(1, 2) chair 50 2021.05.03
monitor 130 2021.05.03
(1,2) desk 100 2021.05.07
mouse 10 2021.05.07
(5,1) kayak 200 2021.05.07
o_okey o_date o_pri
2 2021.05.01 1
4 2021.05.02 3
l_okey l_name l_price l_shipdate
(2,1) pot 20 2021.05.01
(2,1) pan 25 2021.05.04
(4,1) bike 120 2021.05.04
(4,1) helmet 30 2021.05.10
Back to Encoding and Sorting Setup
Node 1 Node 2
Order Line_Item Line_Item
Order
Examples: Query - Do the encoding & sorting help?
SELECT
l_okey, sum(l_price) as revenue, o_date, o_pri
FROM
customer, orders, lineitem
WHERE
l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03
GROUP BY
l_okey, o_date,o_pri
ORDER BY
revenue desc, o_date;
Yes
● Join: l_okey = o_key
○ → use fast & more memory efficient merge join because data already sorted on the join keys
○ → l_okey can be kept in RLE during join
● Group By: l_okey, o_date,o_pri
○ → Group-by key is sorted and no need doing hash groupby, simply group data as we get new batches until we reach
higher value
Examples: Query - Do the encoding & sorting help?
SELECT
l_okey, sum(l_price) as revenue, o_date, o_pri
FROM
customer, orders, lineitem
WHERE
l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03
GROUP BY
l_okey, o_date,o_pri
ORDER BY
revenue desc, o_date;
What if Order is not sorted on o_okey and Line_Item is not RLE on l_okey?
Examples: Query - Do the encoding & sorting help?
SELECT
l_okey, sum(l_price) as revenue, o_date, o_pri
FROM
customer, orders, lineitem
WHERE
l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03
GROUP BY
l_okey, o_date,o_pri
ORDER BY
revenue desc, o_date;
What if Order is not sorted on o_okey and Line_Item is not RLE on l_okey?
● → use hash join instead (usually slower and requires more memory than merge join)
● → use hash-group-by method (similarly, usually slower and requires more memory than pipe-lined group-by)
● → If there are only a few line items per order, the RLE won’t save much space
Database Designer:
● Topic for another talk
● Startup: Ottertune https://ottertune.com
○ Database Optimization on Autopilot
How to design sharding, partitioning, encoding, and sorting
for a combination of queries?
So what we have demonstrated today?
● Sharding
○ Horizontally split a table into N non-overlapping shards
■ → each node will (equally) share 1/n of the workload:
● Load 1/n data to each node
● Query: join & group-by on each node share 1/n workload
● Partitioning
○ Each shard is further split into smaller partitions for better data filtering, deleting, fanning
out, local parallelism
● Encoding
○ Each column is encoded (sorted & compressed) to further help on join, filtering, group-by, order-by
→ Can you think of examples for the cases we have not covered?
Thank you
Ad

Recommended

InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxData
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxData
 
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxData
 
Catalogs - Turning a Set of Parquet Files into a Data Set
Catalogs - Turning a Set of Parquet Files into a Data Set
InfluxData
 
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
InfluxData
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxData
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
Databricks
 
Understanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage Engine
InfluxData
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
ScyllaDB
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit
 
Using Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Altinity Ltd
 
Ceph and RocksDB
Ceph and RocksDB
Sage Weil
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
Julien Le Dem
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
Databricks
 
Presto
Presto
Knoldus Inc.
 
Cassandra Introduction & Features
Cassandra Introduction & Features
DataStax Academy
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions
Yugabyte
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
Portable UDFs: Write Once, Run Anywhere
Portable UDFs: Write Once, Run Anywhere
Databricks
 
Introduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparound
Masahiko Sawada
 
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Databricks
 
RocksDB compaction
RocksDB compaction
MIJIN AN
 
MyRocks Deep Dive
MyRocks Deep Dive
Yoshinori Matsunobu
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To Transactions
Mydbops
 
InnoDB MVCC Architecture (by 권건우)
InnoDB MVCC Architecture (by 권건우)
I Goo Lee.
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
PostgreSQL-Consulting
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
MongoDB
 
1Horizontal and vertical partitioning of data.pptx
1Horizontal and vertical partitioning of data.pptx
MuhammadAliAzamKhatt
 
Database Sharding: Complete understanding
Database Sharding: Complete understanding
servicesNitor
 

More Related Content

What's hot (20)

Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
ScyllaDB
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit
 
Using Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Altinity Ltd
 
Ceph and RocksDB
Ceph and RocksDB
Sage Weil
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
Julien Le Dem
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
Databricks
 
Presto
Presto
Knoldus Inc.
 
Cassandra Introduction & Features
Cassandra Introduction & Features
DataStax Academy
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions
Yugabyte
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
Portable UDFs: Write Once, Run Anywhere
Portable UDFs: Write Once, Run Anywhere
Databricks
 
Introduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparound
Masahiko Sawada
 
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Databricks
 
RocksDB compaction
RocksDB compaction
MIJIN AN
 
MyRocks Deep Dive
MyRocks Deep Dive
Yoshinori Matsunobu
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To Transactions
Mydbops
 
InnoDB MVCC Architecture (by 권건우)
InnoDB MVCC Architecture (by 권건우)
I Goo Lee.
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
PostgreSQL-Consulting
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
MongoDB
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
ScyllaDB
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit
 
Using Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Altinity Ltd
 
Ceph and RocksDB
Ceph and RocksDB
Sage Weil
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
Julien Le Dem
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
Databricks
 
Cassandra Introduction & Features
Cassandra Introduction & Features
DataStax Academy
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions
Yugabyte
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
Portable UDFs: Write Once, Run Anywhere
Portable UDFs: Write Once, Run Anywhere
Databricks
 
Introduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparound
Masahiko Sawada
 
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Databricks
 
RocksDB compaction
RocksDB compaction
MIJIN AN
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To Transactions
Mydbops
 
InnoDB MVCC Architecture (by 권건우)
InnoDB MVCC Architecture (by 권건우)
I Goo Lee.
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
PostgreSQL-Consulting
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
MongoDB
 

Similar to Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query Performance (20)

1Horizontal and vertical partitioning of data.pptx
1Horizontal and vertical partitioning of data.pptx
MuhammadAliAzamKhatt
 
Database Sharding: Complete understanding
Database Sharding: Complete understanding
servicesNitor
 
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Citus Data
 
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Citus Data
 
Lecture Notes Unit3 chapter21 - parallel databases
Lecture Notes Unit3 chapter21 - parallel databases
Murugan146644
 
Cassandra sharding and consistency (lightning talk)
Cassandra sharding and consistency (lightning talk)
Federico Razzoli
 
Scaling MongoDB with Horizontal and Vertical Sharding
Scaling MongoDB with Horizontal and Vertical Sharding
Mydbops
 
Optimize the Large Scale Graph Applications by using Apache Spark with 4-5x P...
Optimize the Large Scale Graph Applications by using Apache Spark with 4-5x P...
Databricks
 
Distribution Models.pptxgdfgdfgdfgfdgdfg
Distribution Models.pptxgdfgdfgdfgfdgdfg
zmulani8
 
Scalable data systems at Traveloka
Scalable data systems at Traveloka
Rendy Bambang Junior
 
Data sharding
Data sharding
Aditi Anand
 
Scaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQL
Jim Mlodgenski
 
Understanding Database Sharding and Partitioning
Understanding Database Sharding and Partitioning
Hitechnectar
 
MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen
MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen
ryanthiessen
 
Scaling-MongoDB-with-Horizontal-and-Vertical-Sharding Mydbops Opensource Data...
Scaling-MongoDB-with-Horizontal-and-Vertical-Sharding Mydbops Opensource Data...
Mydbops
 
Optimizations in Spark; RDD, DataFrame
Optimizations in Spark; RDD, DataFrame
Knoldus Inc.
 
Streaming SQL
Streaming SQL
Julian Hyde
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / Sharding
Amir Reza Hashemi
 
NOSQL DATABASES UNIT-3 FOR ENGINEERING STUDENTS
NOSQL DATABASES UNIT-3 FOR ENGINEERING STUDENTS
Abcd463572
 
Distributed RDBMS: Challenges, Solutions & Trade-offs
Distributed RDBMS: Challenges, Solutions & Trade-offs
Ahmed Magdy Ezzeldin, MSc.
 
1Horizontal and vertical partitioning of data.pptx
1Horizontal and vertical partitioning of data.pptx
MuhammadAliAzamKhatt
 
Database Sharding: Complete understanding
Database Sharding: Complete understanding
servicesNitor
 
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Citus Data
 
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Citus Data
 
Lecture Notes Unit3 chapter21 - parallel databases
Lecture Notes Unit3 chapter21 - parallel databases
Murugan146644
 
Cassandra sharding and consistency (lightning talk)
Cassandra sharding and consistency (lightning talk)
Federico Razzoli
 
Scaling MongoDB with Horizontal and Vertical Sharding
Scaling MongoDB with Horizontal and Vertical Sharding
Mydbops
 
Optimize the Large Scale Graph Applications by using Apache Spark with 4-5x P...
Optimize the Large Scale Graph Applications by using Apache Spark with 4-5x P...
Databricks
 
Distribution Models.pptxgdfgdfgdfgfdgdfg
Distribution Models.pptxgdfgdfgdfgfdgdfg
zmulani8
 
Scalable data systems at Traveloka
Scalable data systems at Traveloka
Rendy Bambang Junior
 
Scaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQL
Jim Mlodgenski
 
Understanding Database Sharding and Partitioning
Understanding Database Sharding and Partitioning
Hitechnectar
 
MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen
MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen
ryanthiessen
 
Scaling-MongoDB-with-Horizontal-and-Vertical-Sharding Mydbops Opensource Data...
Scaling-MongoDB-with-Horizontal-and-Vertical-Sharding Mydbops Opensource Data...
Mydbops
 
Optimizations in Spark; RDD, DataFrame
Optimizations in Spark; RDD, DataFrame
Knoldus Inc.
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / Sharding
Amir Reza Hashemi
 
NOSQL DATABASES UNIT-3 FOR ENGINEERING STUDENTS
NOSQL DATABASES UNIT-3 FOR ENGINEERING STUDENTS
Abcd463572
 
Distributed RDBMS: Challenges, Solutions & Trade-offs
Distributed RDBMS: Challenges, Solutions & Trade-offs
Ahmed Magdy Ezzeldin, MSc.
 
Ad

More from InfluxData (20)

Announcing InfluxDB Clustered
Announcing InfluxDB Clustered
InfluxData
 
Best Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow Ecosystem
InfluxData
 
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
InfluxData
 
Power Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDB
InfluxData
 
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
InfluxData
 
Build an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING Stack
InfluxData
 
Meet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using Rust
InfluxData
 
Introducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud Dedicated
InfluxData
 
Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB
InfluxData
 
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
InfluxData
 
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
InfluxData
 
Introducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage Engine
InfluxData
 
Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena
InfluxData
 
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
InfluxData
 
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
InfluxData
 
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
InfluxData
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
InfluxData
 
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
InfluxData
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
InfluxData
 
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
InfluxData
 
Announcing InfluxDB Clustered
Announcing InfluxDB Clustered
InfluxData
 
Best Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow Ecosystem
InfluxData
 
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
InfluxData
 
Power Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDB
InfluxData
 
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
InfluxData
 
Build an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING Stack
InfluxData
 
Meet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using Rust
InfluxData
 
Introducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud Dedicated
InfluxData
 
Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB
InfluxData
 
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
InfluxData
 
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
InfluxData
 
Introducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage Engine
InfluxData
 
Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena
InfluxData
 
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
InfluxData
 
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
InfluxData
 
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
InfluxData
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
InfluxData
 
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
InfluxData
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
InfluxData
 
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
InfluxData
 
Ad

Recently uploaded (20)

Mastering AI Workflows with FME - Peak of Data & AI 2025
Mastering AI Workflows with FME - Peak of Data & AI 2025
Safe Software
 
Data Validation and System Interoperability
Data Validation and System Interoperability
Safe Software
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
High Availability On-Premises FME Flow.pdf
High Availability On-Premises FME Flow.pdf
Safe Software
 
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
AmirStern2
 
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Alliance
 
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Safe Software
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
Safe Software
 
The State of Web3 Industry- Industry Report
The State of Web3 Industry- Industry Report
Liveplex
 
Murdledescargadarkweb.pdfvolumen1 100 elementary
Murdledescargadarkweb.pdfvolumen1 100 elementary
JorgeSemperteguiMont
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
 
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
Safe Software
 
Supporting the NextGen 911 Digital Transformation with FME
Supporting the NextGen 911 Digital Transformation with FME
Safe Software
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
Mastering AI Workflows with FME - Peak of Data & AI 2025
Mastering AI Workflows with FME - Peak of Data & AI 2025
Safe Software
 
Data Validation and System Interoperability
Data Validation and System Interoperability
Safe Software
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
High Availability On-Premises FME Flow.pdf
High Availability On-Premises FME Flow.pdf
Safe Software
 
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
AmirStern2
 
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Alliance
 
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Safe Software
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
Safe Software
 
The State of Web3 Industry- Industry Report
The State of Web3 Industry- Industry Report
Liveplex
 
Murdledescargadarkweb.pdfvolumen1 100 elementary
Murdledescargadarkweb.pdfvolumen1 100 elementary
JorgeSemperteguiMont
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
 
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
Safe Software
 
Supporting the NextGen 911 Digital Transformation with FME
Supporting the NextGen 911 Digital Transformation with FME
Safe Software
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 

Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query Performance

  • 1. Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query Performance Nga Tran Staff Engineer, InfluxData July 14, 2021
  • 2. ● InfluxData - Staff Engineer ● Tableau/Salesforce (2 years) ○ Sr. Manager of Automatic Statistics ● Vertica RDBMS (over a decade) ○ Engineer of Query Optimizer ○ Director of Engineering (R&D) ● ELCA (4 years)
  • 3. Outline ● Non-distributed vs Distributed Databases ● Splitting Data to Gain Query Performance ○ Sharding, Partitioning, Encoding, and Sorting ● Impacts of different data setups on Query Performance
  • 4. Distributed Database Non-Distributed DB: 1-node cluster ● 1 machine ● Data is loaded & then queried on that node Distributed DB: Cluster of many nodes ● Several machines shares the work ● Data is horizontally split between nodes ● Data is queried from all nodes Node Non-Distributed DB Node 1 Node 2 Node n N nodes, each plays the same role and talks to each other Distributed DB Row 1 Row 2 …….. Row a Row a+1 Row a+2 ……….. Row b Row x+1 Row x+2 ……….. Row n
  • 5. Distributed Database Non-Distributed DB: 1-node cluster ● 1 machine ● Data is loaded & then queried on that node Distributed DB: Cluster of many nodes ● Several machines shares the work ● Data is horizontally split between nodes ● Data is queried from all nodes → How to split data to gain query performance? Node Non-Distributed DB Node 1 Node 2 Node n N nodes, each plays the same role and talks to each other Distributed DB Row 1 Row 2 …….. Row a Row a+1 Row a+2 ……….. Row b Row x+1 Row x+2 ……….. Row n
  • 6. Splitting Data to Gain Query Performance ● Sharding ○ Horizontally split a table into N non-overlapping shards ■ → each node will (equally) share 1/n of the workload: ● Load 1/n data to each node ● Query: join & group-by on each node share 1/n workload ● Partitioning ○ Each shard is further split into smaller partitions for better data filtering, deleting, fanning out, local parallelism ● Encoding ○ Each column is encoded (sorted & compressed) to further help on join, filtering, group-by, order-by
  • 7. Splitting Data to Gain Query Performance ● Sharding ○ Horizontally split a table into N non-overlapping shards ■ → each node will (equally) share 1/n of the workload: ● Load 1/n data to each node ● Query: join & group-by on each node share 1/n workload ● Partitioning ○ Each shard is further split into smaller partitions for better data filtering, deleting, fanning out, local parallelism ● Encoding ○ Each column is encoded (sorted & compressed) to further help on join, filtering, group-by, order-by → Let us dig into examples
  • 8. Line_Item o_okey o_date o_pri 1 2021.05.01 2 2 2021.05.01 1 3 2021.05.02 1 4 2021.05.02 3 5 2021.05.02 1 Examples: Two tables Order & Line_Item Order l_okey l_name l_price l_shipdate 1 desk 100 2021.05.07 1 chair 50 2021.05.03 1 monitor 130 2021.05.03 1 mouse 10 2021.05.07 2 pot 20 2021.05.01 2 pan 25 2021.05.04 3 shirt 30 2021.05.10 4 bike 120 2021.05.04 4 helmet 30 2021.05.10 5 kayak 200 2021.05.05 5 lifevest 20 2021.05.02
  • 9. Sharded : Order: (o_okey % 2) & Line_Item: (l_okey % 2) o_okey o_date o_pri 1 2021.05.01 2 3 2021.05.01 1 5 2021.05.02 1 l_okey l_name l_price l_shipdate 1 desk 100 2021.05.07 1 chair 50 2021.05.03 1 monitor 130 2021.05.03 1 mouse 10 2021.05.07 3 shirt 30 2021.05.2 5 kayak 200 2021.05.07 5 lifevest 20 2021.05.02 o_okey o_date o_pri 2 2021.05.01 1 4 2021.05.02 3 l_okey l_name l_price l_shipdate 2 pot 20 2021.05.01 2 pan 25 2021.05.04 4 bike 120 2021.05.04 4 helmet 30 2021.05.10 Examples: 2-node cluster Node 1 Node 2 Order Line_Item Line_Item Order
  • 10. Sharded : Order: (o_okey % 2) & Line_Item: (l_okey % 2) Partitioned : Order: (o_date) & Line_Item: (l_shipdate) o_okey o_date o_pri 1 2021.05.01 2 3 2021.05.01 1 5 2021.05.02 1 l_okey l_name l_price l_shipdate 3 shirt 30 2021.05.2 5 lifevest 20 2021.05.02 1 chair 50 2021.05.03 1 monitor 130 2021.05.03 1 desk 100 2021.05.07 1 mouse 10 2021.05.07 5 kayak 200 2021.05.07 o_okey o_date o_pri 2 2021.05.01 1 4 2021.05.02 3 l_okey l_name l_price l_shipdate 2 pot 20 2021.05.01 2 pan 25 2021.05.04 4 bike 120 2021.05.04 4 helmet 30 2021.05.10 Examples: 2-node cluster Node 1 Node 2 Order Line_Item Line_Item Order
  • 11. Sharded : Order: (o_okey % 2) & Line_Item: (l_okey % 2) Partitioned : Order: (o_date) & Line_Item: (l_shipdate) Encoded & Sorted : Order: (o_okey) & Line_Item: RLE(l_okey) o_okey o_date o_pri 1 2021.05.01 2 3 2021.05.01 1 5 2021.05.02 1 l_okey l_name l_price l_shipdate (3,1) shirt 30 2021.05.2 (5,1) lifevest 20 2021.05.02 (1, 2) chair 50 2021.05.03 monitor 130 2021.05.03 (1,2) desk 100 2021.05.07 mouse 10 2021.05.07 (5,1) kayak 200 2021.05.07 o_okey o_date o_pri 2 2021.05.01 1 4 2021.05.02 3 l_okey l_name l_price l_shipdate (2,1) pot 20 2021.05.01 (2,1) pan 25 2021.05.04 (4,1) bike 120 2021.05.04 (4,1) helmet 30 2021.05.10 Examples: 2-node cluster Node 1 Node 2 Order Line_Item Line_Item Order
  • 12. Impacts of the setups on query performance
  • 13. Examples: Query SELECT l_okey, sum(l_price) as revenue, o_date, o_pri FROM customer, orders, lineitem WHERE l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03 GROUP BY l_okey, o_date,o_pri ORDER BY revenue desc, o_date;
  • 14. Examples: Query - Do the shards help? SELECT l_okey, sum(l_price) as revenue, o_date, o_pri FROM customer, orders, lineitem WHERE l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03 GROUP BY l_okey, o_date,o_pri ORDER BY revenue desc, o_date;
  • 15. Sharded : Order: (o_okey % 2) & Line_Item: (l_okey % 2) o_okey o_date o_pri 1 2021.05.01 2 3 2021.05.01 1 5 2021.05.02 1 l_okey l_name l_price l_shipdate 1 desk 100 2021.05.07 1 chair 50 2021.05.03 1 monitor 130 2021.05.03 1 mouse 10 2021.05.07 3 shirt 30 2021.05.2 5 kayak 200 2021.05.07 5 lifevest 20 2021.05.02 o_okey o_date o_pri 2 2021.05.01 1 4 2021.05.02 3 l_okey l_name l_price l_shipdate 2 pot 20 2021.05.01 2 pan 25 2021.05.04 4 bike 120 2021.05.04 4 helmet 30 2021.05.10 Back to Shard setup Node 1 Node 2 Order Line_Item Line_Item Order
  • 16. Examples: Query - Do the shards help? SELECT l_okey, sum(l_price) as revenue, o_date, o_pri FROM customer, orders, lineitem WHERE l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03 GROUP BY l_okey, o_date, o_pri ORDER BY revenue desc, o_date; YES ● Join: l_okey = o_key ○ → all odd keys in node 1 and even keys in node 2 ○ → Node 1 and node 2 join data on their local node. No need to shuffle data between nodes before joining. ● Group By: l_okey, o_date, o_pri ○ → Similarly, same group-by keys are in the same nodes. Each node can aggregate data without the need to reshuffle data
  • 17. Examples: Query - Do the shards help? SELECT l_okey, sum(l_price) as revenue, o_date, o_pri FROM customer, orders, lineitem WHERE l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03 GROUP BY l_key, o_date, o_pri ORDER BY revenue desc, o_date; What if Order not sharded on o_okey & Line_item not sharded on l_okey?
  • 18. Examples: Query - Do the shards help? SELECT l_okey, sum(l_price) as revenue, o_date, o_pri FROM customer, orders, lineitem WHERE l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03 GROUP BY l_key, o_date, o_pri ORDER BY revenue desc, o_date; What if Order not sharded on o_okey & Line_item not sharded on l_okey? ● Join: l_okey = o_key ○ → Need to reshuffle data so same join keys land on the same nodes before joining. Many ways: ■ Reshard on the fly both Order on o_okey and Line_Item on l_okey ■ Broadcast small table (o_okey) to other nodes ● Group By: l_okey, o_date, o_pri ○ → If after the join the data is shared on l_okey, nothing is needed. Otherwise, either: ■ Reshard data on l_okey to 2 nodes ■ Send everything to one node to do the final group-by
  • 19. Examples: Query - Do the shards help? SELECT l_okey, sum(l_price) as revenue, o_date, o_pri FROM customer, orders, lineitem WHERE l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03 GROUP BY l_key, o_date, o_pri ORDER BY revenue desc, o_date; What if Order not sharded on o_okey & Line_item not sharded on l_okey? ● → Not sharded on join keys will lead to extra on-the-fly reshard or broadcast cost ● → Not already (re-)sharded on group-by keys before the group-by operator will lead to either ○ Reshard or ○ The final node has to do all the group-by work
  • 20. Examples: Query - Do the partitions help? SELECT l_okey, sum(l_price) as revenue, o_date, o_pri FROM customer, orders, lineitem WHERE l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03 GROUP BY l_okey, o_date,o_pri ORDER BY revenue desc, o_date;
  • 21. Sharded : Order: (o_okey % 2) & Line_Item: (l_okey % 2) Partitioned : Order: (o_date) & Line_Item: (l_shipdate) o_okey o_date o_pri 1 2021.05.01 2 3 2021.05.01 1 5 2021.05.02 1 l_okey l_name l_price l_shipdate 3 shirt 30 2021.05.2 5 lifevest 20 2021.05.02 1 chair 50 2021.05.03 1 monitor 130 2021.05.03 1 desk 100 2021.05.07 1 mouse 10 2021.05.07 5 kayak 200 2021.05.07 o_okey o_date o_pri 2 2021.05.01 1 4 2021.05.02 3 l_okey l_name l_price l_shipdate 2 pot 20 2021.05.01 2 pan 25 2021.05.04 4 bike 120 2021.05.04 4 helmet 30 2021.05.10 Back to Partition Setup Node 1 Node 2 Order Line_Item Line_Item Order
  • 22. Examples: Query - Do the partitions help? SELECT l_okey, sum(l_price) as revenue, o_date, o_pri FROM customer, orders, lineitem WHERE l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03 GROUP BY l_okey, o_date,o_pri ORDER BY revenue desc, o_date; Yes ● Filter: o_date < 2021.05.02 and l_shipdate > 2021.05.03 ○ → Prune partitions not in the filter ranges
  • 23. Examples: Query - Do the partitions help? SELECT l_okey, sum(l_price) as revenue, o_date, o_pri FROM customer, orders, lineitem WHERE l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03 GROUP BY l_okey, o_date,o_pri ORDER BY revenue desc, o_date; What if Order is not partitioned on o_date and Line_Item not partitioned on l_shipdate?
  • 24. Examples: Query - Do the partitions help? SELECT l_okey, sum(l_price) as revenue, o_date, o_pri FROM customer, orders, lineitem WHERE l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03 GROUP BY l_okey, o_date,o_pri ORDER BY revenue desc, o_date; What if Order is not partitioned on o_date and Line_Item not partitioned on l_shipdate? ● → nothing to prune early, we have to scan all column data and apply the filter ranges
  • 25. Examples: Query - Do the encoding & sorting help? SELECT l_okey, sum(l_price) as revenue, o_date, o_pri FROM customer, orders, lineitem WHERE l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03 GROUP BY l_okey, o_date,o_pri ORDER BY revenue desc, o_date;
  • 26. Sharded : Order: (o_okey % 2) & Line_Item: (l_okey % 2) Partitioned : Order: (o_date) & Line_Item: (l_shipdate) Encoded & Sorted : Order: (o_okey) & Line_Item: RLE(l_okey) o_okey o_date o_pri 1 2021.05.01 2 3 2021.05.01 1 5 2021.05.02 1 l_okey l_name l_price l_shipdate (3,1) shirt 30 2021.05.2 (5,1) lifevest 20 2021.05.02 (1, 2) chair 50 2021.05.03 monitor 130 2021.05.03 (1,2) desk 100 2021.05.07 mouse 10 2021.05.07 (5,1) kayak 200 2021.05.07 o_okey o_date o_pri 2 2021.05.01 1 4 2021.05.02 3 l_okey l_name l_price l_shipdate (2,1) pot 20 2021.05.01 (2,1) pan 25 2021.05.04 (4,1) bike 120 2021.05.04 (4,1) helmet 30 2021.05.10 Back to Encoding and Sorting Setup Node 1 Node 2 Order Line_Item Line_Item Order
  • 27. Examples: Query - Do the encoding & sorting help? SELECT l_okey, sum(l_price) as revenue, o_date, o_pri FROM customer, orders, lineitem WHERE l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03 GROUP BY l_okey, o_date,o_pri ORDER BY revenue desc, o_date; Yes ● Join: l_okey = o_key ○ → use fast & more memory efficient merge join because data already sorted on the join keys ○ → l_okey can be kept in RLE during join ● Group By: l_okey, o_date,o_pri ○ → Group-by key is sorted and no need doing hash groupby, simply group data as we get new batches until we reach higher value
  • 28. Examples: Query - Do the encoding & sorting help? SELECT l_okey, sum(l_price) as revenue, o_date, o_pri FROM customer, orders, lineitem WHERE l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03 GROUP BY l_okey, o_date,o_pri ORDER BY revenue desc, o_date; What if Order is not sorted on o_okey and Line_Item is not RLE on l_okey?
  • 29. Examples: Query - Do the encoding & sorting help? SELECT l_okey, sum(l_price) as revenue, o_date, o_pri FROM customer, orders, lineitem WHERE l_okey = o_key and o_date < 2021.05.02 and l_shipdate > 2021.05.03 GROUP BY l_okey, o_date,o_pri ORDER BY revenue desc, o_date; What if Order is not sorted on o_okey and Line_Item is not RLE on l_okey? ● → use hash join instead (usually slower and requires more memory than merge join) ● → use hash-group-by method (similarly, usually slower and requires more memory than pipe-lined group-by) ● → If there are only a few line items per order, the RLE won’t save much space
  • 30. Database Designer: ● Topic for another talk ● Startup: Ottertune https://ottertune.com ○ Database Optimization on Autopilot How to design sharding, partitioning, encoding, and sorting for a combination of queries?
  • 31. So what we have demonstrated today? ● Sharding ○ Horizontally split a table into N non-overlapping shards ■ → each node will (equally) share 1/n of the workload: ● Load 1/n data to each node ● Query: join & group-by on each node share 1/n workload ● Partitioning ○ Each shard is further split into smaller partitions for better data filtering, deleting, fanning out, local parallelism ● Encoding ○ Each column is encoded (sorted & compressed) to further help on join, filtering, group-by, order-by → Can you think of examples for the cases we have not covered?