SlideShare a Scribd company logo
Introduction to
Parallel Processing
Algorithms
in Shared Nothing
Databases
Ofir Manor
Agenda
• Introduction
• Sample Architecture
• The optimizer and execution plans
• Examples of single table processing
• Examples of Join processing
Scaling Databases
• Scaling – expending a system to support more data / sessions.
• Best scalability – linear, predictable.

• Scale-up (bigger server) vs. Scale-out (more servers)
• Scaling up – easier, but limited, expensive

• Most common scale-out strategy – Sharding
• Spreading the data (rows in a table) across many independent nodes
• Each node has a different subset of the data – Shared Nothing

• Processing sharded data across shared nothing cluster is also called
Massive Parallel Processing (MPP)
• MPP databases appeared since the 80s (ex: Teradata), became popular in
the analytic space in the 2000s (ex: Netezza, Greenplum, Vertica)
• Open source examples over Hadoop – Hive(*), Impala
Sample MPP database architecture

SQL
Client

5. Results

Master Node

Holds Data Dictionary,
Sessions, Optimizer

4. Results

Shard 002
Shard 003
Shard 004
Shard 005

…
Shard nnn

Each table – distributed across all shards

1. SQL

2. Execution
Plan

3. Parallel SQL Execution

Shard 001
Processing – Analytical vs.
Operational
• With MongoDB – most operations involve a single document
• With SQL – most operations involve processing many rows, likely
across all shards
• Example, sum of sales per day per store

• Also, SQL is more expressive – it has a rich set of complex operations
(joining, aggregating, sorting etc)
• A database optimizer builds an execution plan:
• The access path per table (full scan, index scan etc)
• The order of the joins
• The type of each join (multiple algorithms)
Execution Plan- Sample Table
• Syntax and execution plans are based on Greenplum – but the lessons are general.

• We’ll start with a simple, single table, with no indexes.
• Hold data for calls
• CREATE TABLE calls
(subscriber_id integer,
call_date
date,
call_length
integer)
DISTRIBUTED BY (subscriber_id);

• We can control the sharding key (distribution key) – will allow later
some join optimizations.
• Row Placement: shard number = hash(subscriber_id, # of shards)
• Generally, we want data to be spread equally across all shards (no skew)
Single Table Execution - Plan 1
• EXPLAIN SELECT * FROM calls
WHERE call_date BETWEEN '2013/11/01' AND '2013/11/30';
•
QUERY PLAN
-------------------------------------------------Gather Motion n:1
-> Seq Scan on calls
Filter: call_date >= '2013-11-01'::date AND
call_date <= '2013-11-30'::date

• Sequential Scan – a full scan of each table shard
• Filter – applied during the scan
• Gather Motion – moving the result set of each shard to the master
Single Table Execution - Plan 2
• EXPLAIN SELECT call_date, count(*)
FROM calls
WHERE call_length <= 60
GROUP BY call_date;
• Challenge – do the group by in parallel
• General case - could be millions or billions of groups

• Challenge – the rows for each group are distributed across all shards
• Conclusion – the processes in the shards need to communicate
Single Table Execution - Plan 2
Send Final Results to the Master

Process Group 2 - Final Aggregation of each group
Shard nnn

Shard 009

Shard 008

Shard 007

Shard 006

Shard 005

Shard 004

Shard 003

Shard 002

Shard 001

Re-distributing (streaming) the result set over the cluster network (n:n)
Process Group 1 - Local Scan, Filter and Aggregation

…
Single Table Execution - Plan 2
• QUERY PLAN
-------------------------------------------------Gather Motion n:1
-> HashAggregate
Group By: calls.call_date
-> Redistribute Motion n:n
Hash Key: calls.call_date
-> HashAggregate
Group By: calls.call_date
-> Seq Scan on calls
Filter: call_length <= 60
• HashAggregate – does aggregation algorithm per group
• Redistribute Motion – redistribute the data across the shards to a new set of
processes
• Send each row in the result set to shard number = hash(call_date, # of shards)
Single Table Execution - Plan 3
• EXPLAIN SELECT call_date, count(*)
FROM calls
WHERE call_length <= 60
GROUP BY call_date
ORDER BY call_date;
• QUERY PLAN
-------------------------------------------------Gather Motion n:1
Merge Key: call_date
-> Sort
Sort Key: partial_aggregation.call_date
-> HashAggregate
Group By: calls.call_date
-> Redistribute Motion n:n
Hash Key: calls.call_date
-> HashAggregate
Group By: calls.call_date
-> Seq Scan on calls
Filter: call_length <= 60
Execution Plan- A Second Table
• Let’s add a second table so we can have some joins.
• It holds details of each subscriber
• CREATE TABLE subscribers
(subscriber_id
integer,
subscriber_city_code integer)
DISTRIBUTED BY(subscriber_id);

• To start with, both tables have the same distribution key
• So, the all the rows of any specific subscriber, from both tables, will be hosted
in the same shard.
• We can leverage this knowledge in our algorithm
• Later we will see what happens if this is not the case
Simple Join 1 – Same Distribution Key
• EXPLAIN SELECT s.subscriber_id, s.subscriber_city_code,
c.call_date, c.call_length
FROM calls c JOIN subscribers s
ON(c.subscriber_id = s.subscriber_id)
WHERE s.subscriber_city_code = 4;
• QUERY PLAN
-------------------------------------------------Gather Motion n:1
-> Hash Join
Hash Cond: c.subscriber_id = s.subscriber_id
-> Seq Scan on calls c
-> Hash
-> Seq Scan on subscribers s
Filter: subscriber_city_code = 4
• Hash Join – joins two tables
• First table is processed, result set is hashed (based on the join key)
• Second table is scanned, joined to the first using hash lookups
Simple Join 2 – Same Distribution Key
• EXPLAIN SELECT c.call_date, s.subscriber_city_code,
count (*), sum(c.call_length)
FROM calls c JOIN subscribers s
ON (c.subscriber_id = s.subscriber_id)
WHERE s.subscriber_city_code IN (9,99,999)
AND call_date BETWEEN '2012/01/04' AND '2012/01/06‘
GROUP BY 1,2
ORDER BY c.call_date, sum(c.call_length) DESC;

• Nothing new – just a mix of all we’ve seen
Simple Join 2 – Same Distribution Key
QUERY PLAN
-------------------------------------------------Gather Motion n:1
Merge Key: call_date, sum
-> Sort
Sort Key: partial_aggregation.call_date, sum
-> HashAggregate
Group By: c.call_date, s.subscriber_city_code
-> Redistribute Motion n:n
Hash Key: c.call_date, s.subscriber_city_code
-> HashAggregate
Group By: c.call_date, s.subscriber_city_code
-> Hash Join
Hash Cond: c.subscriber_id = s.subscriber_id
-> Seq Scan on calls c
Filter: call_date >= '2012-01-04'::date AND
call_date <= '2012-01-06'::date
-> Hash
-> Seq Scan on subscribers s
Filter: subscriber_city_code =
ANY ('{9,99,999}'::integer[])
Simple Join 1 – Different Distribution
Key
• What if the subscriber table was distributed differently?
• ALTER TABLE subscribers
SET DISTRIBUTED BY(subscriber_city_code);
• Now our data about subscribers is mixed
• The list of customers in shard 1 in calls table is not the same as in subscriber table

• How to run Simple Join 1 query from before?
• Now, there has to be some shuffling of data over the network
• To minimize the work, it is better to shuffle the smaller table over the network
• Since the join key on calls table is the same as the distribution key (subscriber_id), we
can send each row from the result set of subscriber table directly to the right shard.
Simple Join 1 – Different Distribution
Key
• EXPLAIN SELECT s.subscriber_id, s.subscriber_city_code,
c.call_date, c.call_length
FROM calls c JOIN subscribers s
ON(c.subscriber_id = s.subscriber_id)
WHERE s.subscriber_city_code = 4;
• Same query as Simple Join 1!
• QUERY PLAN
-------------------------------------------------Gather Motion n:1
-> Hash Join
Hash Cond: c.subscriber_id = s.subscriber_id
-> Seq Scan on calls c
-> Hash
-> Redistribute Motion 1:n
Hash Key: s.subscriber_id
-> Seq Scan on subscribers s
Filter: subscriber_city_code = 4
Simple Join 2 – Different Distribution
Key
• EXPLAIN SELECT c.call_date, s.subscriber_city_code,
count (*), sum(c.call_length)
FROM calls c JOIN subscribers s
ON (c.subscriber_id = s.subscriber_id)
WHERE s.subscriber_city_code IN (9,99,999)
AND call_date BETWEEN '2012/01/04' AND '2012/01/06‘
GROUP BY 1,2
ORDER BY c.call_date, sum(c.call_length) DESC;

• Same query as Simple Join 2 – just different distribution
Simple Join 2 – Different Distribution
Key

QUERY PLAN
-------------------------------------------------Gather Motion n:1
Merge Key: call_date, sum
-> Sort
Sort Key: partial_aggregation.call_date, sum
-> HashAggregate
Group By: c.call_date, s.subscriber_city_code
-> Redistribute Motion n:n
Hash Key: c.call_date, s.subscriber_city_code
-> HashAggregate
Group By: c.call_date, s.subscriber_city_code
-> Hash Join
Hash Cond: c.subscriber_id = s.subscriber_id
-> Seq Scan on calls c
Filter: call_date >= '2012-01-04'::date AND
call_date <= '2012-01-06'::date
-> Hash
-> Redistribute Motion n:n
Hash Key: s.subscriber_id
-> Seq Scan on subscribers s
Filter: subscriber_city_code =
ANY ('{9,99,999}'::integer[])
Teasers
• EXPLAIN SELECT * FROM calls
ORDER BY call_length DESC
LIMIT 10;
(Easy - top 10 calls by length)
• EXPLAIN EXPLAIN SELECT call_date, count(*)
FROM calls WHERE call_length <= 60
GROUP BY call_date
HAVING count(*) >= 1000000
ORDER BY call_date;
(Easy – all days with at least a million short calls – HAVING clause)
• EXPLAIN SELECT call_date, count(distinct subscriber_id)
FROM calls GROUP BY call_date;
(Hard – per day, the number of subscribers with calls)
• EXPLAIN SELECT call_date,
count(distinct subscriber_id),
count(distinct call_length)
FROM calls GROUP BY call_date;
(Very Hard – two DISTINCT aggregations)

More Related Content

PDF
M|18 Querying Data at a Previous Point in Time
PDF
Chasing the optimizer
PDF
Same plan different performance
PDF
Percona live-2012-optimizer-tuning
PDF
Histograms in 12c era
PDF
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
PDF
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
M|18 Querying Data at a Previous Point in Time
Chasing the optimizer
Same plan different performance
Percona live-2012-optimizer-tuning
Histograms in 12c era
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013

What's hot (14)

PDF
Is your SQL Exadata-aware?
PPTX
Full Table Scan: friend or foe
PDF
Oracle Diagnostics : Explain Plans (Simple)
PPT
Nagios Conference 2013 - Troy Lea - Leveraging and Understanding Performance ...
PDF
neutron测试例子
PPTX
Adaptive Query Optimization in 12c
PDF
SQLチューニング総合診療Oracle CloudWorld出張所
PDF
db tech showcase Tokyo 2014 - L36 - JPOUG : SQLチューニング総合診療所 ケースファイルX
PDF
MariaDB: Engine Independent Table Statistics, including histograms
PDF
E34 : [JPOUG Presents] Oracle Database の隠されている様々な謎を解くセッション「なーんでだ?」再び @ db tec...
PPTX
SQL Tuning, takes 3 to tango
PDF
Gdce 2010 dx11
PDF
Cisco vs. huawei CLI Commands
PDF
MariaDB 10.0 Query Optimizer
Is your SQL Exadata-aware?
Full Table Scan: friend or foe
Oracle Diagnostics : Explain Plans (Simple)
Nagios Conference 2013 - Troy Lea - Leveraging and Understanding Performance ...
neutron测试例子
Adaptive Query Optimization in 12c
SQLチューニング総合診療Oracle CloudWorld出張所
db tech showcase Tokyo 2014 - L36 - JPOUG : SQLチューニング総合診療所 ケースファイルX
MariaDB: Engine Independent Table Statistics, including histograms
E34 : [JPOUG Presents] Oracle Database の隠されている様々な謎を解くセッション「なーんでだ?」再び @ db tec...
SQL Tuning, takes 3 to tango
Gdce 2010 dx11
Cisco vs. huawei CLI Commands
MariaDB 10.0 Query Optimizer
Ad

Viewers also liked (10)

PDF
The DSP/BIOS Bridge - OMAP3
PDF
Massively Parallel Processing with Procedural Python (PyData London 2014)
PPTX
CS 542 Parallel DBs, NoSQL, MapReduce
PPTX
Load Balancing in Parallel and Distributed Database
PDF
Interactive SQL-on-Hadoop and JethroData
PPTX
Database ,14 Parallel DBMS
PDF
Data Processing and Aggregation with MongoDB
PDF
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
PPTX
Flink vs. Spark
PPSX
Parallel Database
The DSP/BIOS Bridge - OMAP3
Massively Parallel Processing with Procedural Python (PyData London 2014)
CS 542 Parallel DBs, NoSQL, MapReduce
Load Balancing in Parallel and Distributed Database
Interactive SQL-on-Hadoop and JethroData
Database ,14 Parallel DBMS
Data Processing and Aggregation with MongoDB
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Flink vs. Spark
Parallel Database
Ad

Similar to Introduction to Parallel Processing Algorithms in Shared Nothing Databases (20)

PDF
Time-Series and Analytical Databases Walk Into a Bar...
PDF
Redis TimeSeries: Danni Moiseyev, Pieter Cailliau
PPTX
PPT
sqltuningcardinality1(1).ppt
PPTX
Presentation
PDF
query-optimization-techniques_talk.pdf
PDF
Introduction to r
PDF
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
PDF
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
 
PPTX
data science pt time series analysis.pptx
PPTX
Query Compilation in Impala
PDF
Performance Schema for MySQL Troubleshooting
PPT
lecture1.ppt
PPT
C++ Notes PPT.ppt
PDF
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
PDF
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
PPTX
Enar short course
PDF
Practical Graph Algorithms with Neo4j
PPTX
BIRTE-13-Kawashima
ODP
Basic Query Tuning Primer
Time-Series and Analytical Databases Walk Into a Bar...
Redis TimeSeries: Danni Moiseyev, Pieter Cailliau
sqltuningcardinality1(1).ppt
Presentation
query-optimization-techniques_talk.pdf
Introduction to r
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
 
data science pt time series analysis.pptx
Query Compilation in Impala
Performance Schema for MySQL Troubleshooting
lecture1.ppt
C++ Notes PPT.ppt
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Enar short course
Practical Graph Algorithms with Neo4j
BIRTE-13-Kawashima
Basic Query Tuning Primer

Recently uploaded (20)

PDF
August Patch Tuesday
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Mushroom cultivation and it's methods.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Machine Learning_overview_presentation.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
1. Introduction to Computer Programming.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
August Patch Tuesday
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Mushroom cultivation and it's methods.pdf
A Presentation on Artificial Intelligence
NewMind AI Weekly Chronicles - August'25-Week II
Univ-Connecticut-ChatGPT-Presentaion.pdf
Spectral efficient network and resource selection model in 5G networks
A comparative analysis of optical character recognition models for extracting...
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Machine learning based COVID-19 study performance prediction
Machine Learning_overview_presentation.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
1. Introduction to Computer Programming.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
MIND Revenue Release Quarter 2 2025 Press Release
Mobile App Security Testing_ A Comprehensive Guide.pdf
Per capita expenditure prediction using model stacking based on satellite ima...

Introduction to Parallel Processing Algorithms in Shared Nothing Databases

  • 1. Introduction to Parallel Processing Algorithms in Shared Nothing Databases Ofir Manor
  • 2. Agenda • Introduction • Sample Architecture • The optimizer and execution plans • Examples of single table processing • Examples of Join processing
  • 3. Scaling Databases • Scaling – expending a system to support more data / sessions. • Best scalability – linear, predictable. • Scale-up (bigger server) vs. Scale-out (more servers) • Scaling up – easier, but limited, expensive • Most common scale-out strategy – Sharding • Spreading the data (rows in a table) across many independent nodes • Each node has a different subset of the data – Shared Nothing • Processing sharded data across shared nothing cluster is also called Massive Parallel Processing (MPP) • MPP databases appeared since the 80s (ex: Teradata), became popular in the analytic space in the 2000s (ex: Netezza, Greenplum, Vertica) • Open source examples over Hadoop – Hive(*), Impala
  • 4. Sample MPP database architecture SQL Client 5. Results Master Node Holds Data Dictionary, Sessions, Optimizer 4. Results Shard 002 Shard 003 Shard 004 Shard 005 … Shard nnn Each table – distributed across all shards 1. SQL 2. Execution Plan 3. Parallel SQL Execution Shard 001
  • 5. Processing – Analytical vs. Operational • With MongoDB – most operations involve a single document • With SQL – most operations involve processing many rows, likely across all shards • Example, sum of sales per day per store • Also, SQL is more expressive – it has a rich set of complex operations (joining, aggregating, sorting etc) • A database optimizer builds an execution plan: • The access path per table (full scan, index scan etc) • The order of the joins • The type of each join (multiple algorithms)
  • 6. Execution Plan- Sample Table • Syntax and execution plans are based on Greenplum – but the lessons are general. • We’ll start with a simple, single table, with no indexes. • Hold data for calls • CREATE TABLE calls (subscriber_id integer, call_date date, call_length integer) DISTRIBUTED BY (subscriber_id); • We can control the sharding key (distribution key) – will allow later some join optimizations. • Row Placement: shard number = hash(subscriber_id, # of shards) • Generally, we want data to be spread equally across all shards (no skew)
  • 7. Single Table Execution - Plan 1 • EXPLAIN SELECT * FROM calls WHERE call_date BETWEEN '2013/11/01' AND '2013/11/30'; • QUERY PLAN -------------------------------------------------Gather Motion n:1 -> Seq Scan on calls Filter: call_date >= '2013-11-01'::date AND call_date <= '2013-11-30'::date • Sequential Scan – a full scan of each table shard • Filter – applied during the scan • Gather Motion – moving the result set of each shard to the master
  • 8. Single Table Execution - Plan 2 • EXPLAIN SELECT call_date, count(*) FROM calls WHERE call_length <= 60 GROUP BY call_date; • Challenge – do the group by in parallel • General case - could be millions or billions of groups • Challenge – the rows for each group are distributed across all shards • Conclusion – the processes in the shards need to communicate
  • 9. Single Table Execution - Plan 2 Send Final Results to the Master Process Group 2 - Final Aggregation of each group Shard nnn Shard 009 Shard 008 Shard 007 Shard 006 Shard 005 Shard 004 Shard 003 Shard 002 Shard 001 Re-distributing (streaming) the result set over the cluster network (n:n) Process Group 1 - Local Scan, Filter and Aggregation …
  • 10. Single Table Execution - Plan 2 • QUERY PLAN -------------------------------------------------Gather Motion n:1 -> HashAggregate Group By: calls.call_date -> Redistribute Motion n:n Hash Key: calls.call_date -> HashAggregate Group By: calls.call_date -> Seq Scan on calls Filter: call_length <= 60 • HashAggregate – does aggregation algorithm per group • Redistribute Motion – redistribute the data across the shards to a new set of processes • Send each row in the result set to shard number = hash(call_date, # of shards)
  • 11. Single Table Execution - Plan 3 • EXPLAIN SELECT call_date, count(*) FROM calls WHERE call_length <= 60 GROUP BY call_date ORDER BY call_date; • QUERY PLAN -------------------------------------------------Gather Motion n:1 Merge Key: call_date -> Sort Sort Key: partial_aggregation.call_date -> HashAggregate Group By: calls.call_date -> Redistribute Motion n:n Hash Key: calls.call_date -> HashAggregate Group By: calls.call_date -> Seq Scan on calls Filter: call_length <= 60
  • 12. Execution Plan- A Second Table • Let’s add a second table so we can have some joins. • It holds details of each subscriber • CREATE TABLE subscribers (subscriber_id integer, subscriber_city_code integer) DISTRIBUTED BY(subscriber_id); • To start with, both tables have the same distribution key • So, the all the rows of any specific subscriber, from both tables, will be hosted in the same shard. • We can leverage this knowledge in our algorithm • Later we will see what happens if this is not the case
  • 13. Simple Join 1 – Same Distribution Key • EXPLAIN SELECT s.subscriber_id, s.subscriber_city_code, c.call_date, c.call_length FROM calls c JOIN subscribers s ON(c.subscriber_id = s.subscriber_id) WHERE s.subscriber_city_code = 4; • QUERY PLAN -------------------------------------------------Gather Motion n:1 -> Hash Join Hash Cond: c.subscriber_id = s.subscriber_id -> Seq Scan on calls c -> Hash -> Seq Scan on subscribers s Filter: subscriber_city_code = 4 • Hash Join – joins two tables • First table is processed, result set is hashed (based on the join key) • Second table is scanned, joined to the first using hash lookups
  • 14. Simple Join 2 – Same Distribution Key • EXPLAIN SELECT c.call_date, s.subscriber_city_code, count (*), sum(c.call_length) FROM calls c JOIN subscribers s ON (c.subscriber_id = s.subscriber_id) WHERE s.subscriber_city_code IN (9,99,999) AND call_date BETWEEN '2012/01/04' AND '2012/01/06‘ GROUP BY 1,2 ORDER BY c.call_date, sum(c.call_length) DESC; • Nothing new – just a mix of all we’ve seen
  • 15. Simple Join 2 – Same Distribution Key QUERY PLAN -------------------------------------------------Gather Motion n:1 Merge Key: call_date, sum -> Sort Sort Key: partial_aggregation.call_date, sum -> HashAggregate Group By: c.call_date, s.subscriber_city_code -> Redistribute Motion n:n Hash Key: c.call_date, s.subscriber_city_code -> HashAggregate Group By: c.call_date, s.subscriber_city_code -> Hash Join Hash Cond: c.subscriber_id = s.subscriber_id -> Seq Scan on calls c Filter: call_date >= '2012-01-04'::date AND call_date <= '2012-01-06'::date -> Hash -> Seq Scan on subscribers s Filter: subscriber_city_code = ANY ('{9,99,999}'::integer[])
  • 16. Simple Join 1 – Different Distribution Key • What if the subscriber table was distributed differently? • ALTER TABLE subscribers SET DISTRIBUTED BY(subscriber_city_code); • Now our data about subscribers is mixed • The list of customers in shard 1 in calls table is not the same as in subscriber table • How to run Simple Join 1 query from before? • Now, there has to be some shuffling of data over the network • To minimize the work, it is better to shuffle the smaller table over the network • Since the join key on calls table is the same as the distribution key (subscriber_id), we can send each row from the result set of subscriber table directly to the right shard.
  • 17. Simple Join 1 – Different Distribution Key • EXPLAIN SELECT s.subscriber_id, s.subscriber_city_code, c.call_date, c.call_length FROM calls c JOIN subscribers s ON(c.subscriber_id = s.subscriber_id) WHERE s.subscriber_city_code = 4; • Same query as Simple Join 1! • QUERY PLAN -------------------------------------------------Gather Motion n:1 -> Hash Join Hash Cond: c.subscriber_id = s.subscriber_id -> Seq Scan on calls c -> Hash -> Redistribute Motion 1:n Hash Key: s.subscriber_id -> Seq Scan on subscribers s Filter: subscriber_city_code = 4
  • 18. Simple Join 2 – Different Distribution Key • EXPLAIN SELECT c.call_date, s.subscriber_city_code, count (*), sum(c.call_length) FROM calls c JOIN subscribers s ON (c.subscriber_id = s.subscriber_id) WHERE s.subscriber_city_code IN (9,99,999) AND call_date BETWEEN '2012/01/04' AND '2012/01/06‘ GROUP BY 1,2 ORDER BY c.call_date, sum(c.call_length) DESC; • Same query as Simple Join 2 – just different distribution
  • 19. Simple Join 2 – Different Distribution Key QUERY PLAN -------------------------------------------------Gather Motion n:1 Merge Key: call_date, sum -> Sort Sort Key: partial_aggregation.call_date, sum -> HashAggregate Group By: c.call_date, s.subscriber_city_code -> Redistribute Motion n:n Hash Key: c.call_date, s.subscriber_city_code -> HashAggregate Group By: c.call_date, s.subscriber_city_code -> Hash Join Hash Cond: c.subscriber_id = s.subscriber_id -> Seq Scan on calls c Filter: call_date >= '2012-01-04'::date AND call_date <= '2012-01-06'::date -> Hash -> Redistribute Motion n:n Hash Key: s.subscriber_id -> Seq Scan on subscribers s Filter: subscriber_city_code = ANY ('{9,99,999}'::integer[])
  • 20. Teasers • EXPLAIN SELECT * FROM calls ORDER BY call_length DESC LIMIT 10; (Easy - top 10 calls by length) • EXPLAIN EXPLAIN SELECT call_date, count(*) FROM calls WHERE call_length <= 60 GROUP BY call_date HAVING count(*) >= 1000000 ORDER BY call_date; (Easy – all days with at least a million short calls – HAVING clause) • EXPLAIN SELECT call_date, count(distinct subscriber_id) FROM calls GROUP BY call_date; (Hard – per day, the number of subscribers with calls) • EXPLAIN SELECT call_date, count(distinct subscriber_id), count(distinct call_length) FROM calls GROUP BY call_date; (Very Hard – two DISTINCT aggregations)