SlideShare a Scribd company logo
2
Most read
7
Most read
8
Most read
Load Balancing in
Distributed Database
Md. Shamsur Rahim 14-98181-3 Student, MScCS, AIUB
AZM Ehtesham Chowdhury 15-98451-1 Student, MScCS, AIUB
Saiful Akhter 15-98502-1 Student, MScCS, AIUB
Load Balancing:
 Means distributing transaction and queries among different nodes.
 The goal is to maximize the throughput.
 Parallel Execution Problems
 1. Initialization
 2. Interference
 3. Skew
Parallel Execution Problems : Initialization
 Initialization is necessary before execution.
 This sequential steps includes
 Process/ Thread Creation and initialization
 Communication Initialization etc.
 The duration is proportional to the degree of parallelism
 The degree of parallelism should be fixed according to query complexity.
 Formula for finding response time for an Operator:
𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒𝑇𝑖𝑚𝑒 = 𝑎 ∗ 𝑛 +
𝑐∗𝑁
𝑛
 The equation can be further derived to obtain:
𝑁 = 𝑡𝑜𝑢𝑝𝑙𝑒𝑠, 𝑐 = 𝑎𝑣𝑔 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔 𝑡𝑖𝑚𝑒
n = No. Of Processors
optimal number of processors to allocate (n) maximal achievable speedup (S)
𝑛 = √
𝑐 ∗ 𝑁
𝑎
𝑆 =
𝑛
2
Parallel Execution Problems : Interferences
 Parallel execution can be slowed down by interference.
 Interference occurs when several processors simultaneously access the same
resource,
 Hardware
 Solution: Duplicate Shared resource
 Software.
 Solution: Partition the shared resource into several independent resources
Parallel Execution Problems : Skew
 Problem appears with intra- operator parallelism (variation in partition size) is known as data
skew.
 Classification of Skew:
 Attribute Value Skew : inherent in the dataset
 e.g., there are more citizens in Paris than in Waterloo
 Tuple Placement Skew: introduced when the data are initially partitioned
 e.g., with range partitioning
 Selectivity Skew
 introduced when there is variation in the selectivity of select predicates on each node
 Redistribution Skew
 occurs in the redistribution step between two operators.
 Join Product Skew
 occurs because the join selectivity may vary between nodes
Inter-Query Parallelism
 Form of parallelism where many different Queries or Transactions are
executed in parallel with one another on many processors.
 Advantages:
 Increases Transaction Throughput.
 Scales up the Transaction processing system
 Easy to implement in Shared Memory Parallel System.
 Example: Oracle 8 & Oracle Rdb.
Intra-Query Parallelism
 Form of parallelism where Single Query is executed in parallel on many
processors.
 2 Types.
 Intra-operation parallelism
 Inter-operation parallelism
 Advantages:
 speed up a single complex long running queries.
 Best suited for complex scientific calculations (queries).
 Example: Informix, Terradata.
Intra-operation parallelism
 The process of speeding up a query through parallelizing the execution of
individual operations.
 The operations which can be parallelized are Sort, Join, Projection, Selection
and so on.
Inter-operation parallelism
 The process of speeding up a query through parallelizing various operations
which are part of the query.
 Example Step:
 A query which involves join of 4 tables executed in two processors
 Each processor shall join two relations locally and the result1 and result2 can be
joined further to produce the final result.
Intra-Operator Load Balancing
 Depends on
 The degree of parallelism.
 Allocation of processors for the operator.
 The home of the operator (the set of processors where it is executed) must be
carefully decided.
 The skew problem makes it hard for a parallel query optimizer to make this
decision statically.
 Require a very accurate and detailed cost model.
 Two Solutions incorporated in a hybrid query optimizer.
 Adaptive
 Specialized
Adaptive Technique
 The main idea is to statically decide on an initial allocation of the
processors to the operator (using a cost model).
 Adapt to skew using load reallocation.
 Load reallocation is to detect the oversized partitions.
 Partition them again onto several processors.
Adaptive Technique(Continued)
 Advantage:
 More dynamic adjustment of the degree of parallelism.
 useful to improve intra-operator load balancing in all kinds of parallel
architectures.
 By reducing processor interference
 Excellent load balancing for intra-operator parallelism
Adaptive Technique(Continued)
 specific control operators.
 Detect whether the static estimates for intermediate result sizes differ from
the run-time values.
 Relation redistribution in order to prevent join product skew and
redistribution skew.
 Depends on difference between the estimate and the real value is sufficiently
high.
Specialized techniques
 Two main techniques.
 Range partitioning
 Sampling
 Avoid redistribution skew of the building relation.
 Processors can get partitions of equal numbers of tuples, corresponding to
different ranges of join attribute values.
Specialized techniques(Continued)
 To deal with skew as follows:
 Sample the building relation to determine the partitioning
ranges.
 Redistribute the building relation to the processors using the
ranges. Each processor builds a hash table containing the
incoming tuples.
 Redistribute the probing relation using the same ranges to
the processors. For each tuple received, each processor
probes the hash table to perform the join.
Inter-Operator Load Balancing
 Important to Choose for each operator
 How many and which processors to assign for its execution.
 Taking into account pipeline parallelism, which requires inter-operator
communication.
 Harder to achieve in shared-nothing for this Reasons:
 Choice of the degree of parallelism cause to errors
 Reason: Both processors and operators are discrete entities.
Inter-Operator Load
Balancing(Continued)
 Processors associated with the latest operators in a pipeline
chain may remain idle a significant time.
 Shared-memory allows the parallel execution of independent
pipeline chains
 It is known as Tasks.
 Dynamically adjusting the degree of intra-operator parallelism
of the tasks in order to reach maximum resource utilization.
Activations
 Represents a sequential unit of work
 Can be executed by any thread
 Self-contained
 Can only be executed in the same SM(shared memory)-node
Activation Queues
Moving data activation along pipeline chains
Also called table queues
Threads have unrestricted access to the same SM-node queues
Small number of queue results interference
A thread a queue
Thread
 Simple strategy for good load balancing if number of threads are higher than
the processors
 One thread per processor per query reduce the overhead of interference
 Thread will consume activation as much as possible to limit thread
interference
THANK YOU
Reference:
 M. Tamer Özsu • Patrick Valduriez, Principles of Distributed Database Systems,
Third Edition

More Related Content

PPTX
Query processing and optimization (updated)
PPTX
Distributed concurrency control
PDF
management of distributed transactions
PDF
Ddb 1.6-design issues
PPTX
Validation based protocol
PPTX
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
PPTX
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
PPTX
Concurrency Control in Distributed Database.
Query processing and optimization (updated)
Distributed concurrency control
management of distributed transactions
Ddb 1.6-design issues
Validation based protocol
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Concurrency Control in Distributed Database.

What's hot (20)

PPTX
Distributed Shared Memory
PPT
Distributed document based system
PPTX
2 phase locking protocol DBMS
PDF
Distributed Coordination-Based Systems
PPT
Chapter 4 a interprocess communication
PPTX
Dynamic storage allocation techniques
PPTX
Transactions and Concurrency Control
PPTX
Distributed datababase Transaction and concurrency control
PPTX
Communication in client server system.pptx
PPTX
Distributed Query Processing
PPTX
Overview of Concurrency Control & Recovery in Distributed Databases
PPTX
Query processing
PPTX
Distributed Database Management System
PPT
Deadlock management
PDF
DC_M5_L2_Data Centric Consistency (1).pdf
PPTX
Concurrency control
PPTX
Lecture 3 threads
PPTX
Replication in Distributed Systems
PPT
Consistency protocols
PPTX
Process synchronization
Distributed Shared Memory
Distributed document based system
2 phase locking protocol DBMS
Distributed Coordination-Based Systems
Chapter 4 a interprocess communication
Dynamic storage allocation techniques
Transactions and Concurrency Control
Distributed datababase Transaction and concurrency control
Communication in client server system.pptx
Distributed Query Processing
Overview of Concurrency Control & Recovery in Distributed Databases
Query processing
Distributed Database Management System
Deadlock management
DC_M5_L2_Data Centric Consistency (1).pdf
Concurrency control
Lecture 3 threads
Replication in Distributed Systems
Consistency protocols
Process synchronization
Ad

Viewers also liked (20)

PPTX
Database ,14 Parallel DBMS
PPTX
CS 542 Parallel DBs, NoSQL, MapReduce
PDF
The DSP/BIOS Bridge - OMAP3
PPTX
Introduction to Parallel Processing Algorithms in Shared Nothing Databases
PPTX
Log based and Recovery with concurrent transaction
PPT
Chapter24
PDF
Best practices for DB2 for z/OS log based recovery
PPTX
Database and different types of databases available in market
PPTX
PPTX
Database recovery
PPT
Object Oriented Dbms
PPTX
Data recovery
PPTX
database recovery techniques
PPT
20. Parallel Databases in DBMS
PPT
14. Query Optimization in DBMS
PPT
13. Query Processing in DBMS
PPTX
Disaster recovery and the cloud
DOCX
Database management system
PPT
Data Base Management System
PPTX
Types of databases
Database ,14 Parallel DBMS
CS 542 Parallel DBs, NoSQL, MapReduce
The DSP/BIOS Bridge - OMAP3
Introduction to Parallel Processing Algorithms in Shared Nothing Databases
Log based and Recovery with concurrent transaction
Chapter24
Best practices for DB2 for z/OS log based recovery
Database and different types of databases available in market
Database recovery
Object Oriented Dbms
Data recovery
database recovery techniques
20. Parallel Databases in DBMS
14. Query Optimization in DBMS
13. Query Processing in DBMS
Disaster recovery and the cloud
Database management system
Data Base Management System
Types of databases
Ad

Similar to Load Balancing in Parallel and Distributed Database (20)

PPTX
database slide on modern techniques for optimizing database queries.pptx
PPT
Ch22 parallel d_bs_cs561
PPTX
Query processing strategies in distributed database
PDF
Lecture Notes Unit3 chapter21 - parallel databases
PPTX
Manjeet Singh.pptx
PPT
Advancedrn
PPT
Advanced databases -client /server arch
PPTX
ADBS_parallel Databases in Advanced DBMS
PPT
Distributed query processing for Advance database technology .ppt
PPTX
chapter21-parallel processing. computing
PPTX
Adaptive Query Optimization in 12c
PPTX
Data Stream Management
PPTX
Analysing and troubleshooting Parallel Execution IT Tage 2015
PPT
Ch07.ppt
PPTX
TASK AND DATA PARALLELISM in Computer Science pptx
PDF
Partitioning of Query Processing in Distributed Database System to Improve Th...
PPTX
Data Stream Management
PPT
Parallel Database description in database management
PPT
ch22a_ParallelDBs how parallel Datab.ppt
PPTX
Query optimization
database slide on modern techniques for optimizing database queries.pptx
Ch22 parallel d_bs_cs561
Query processing strategies in distributed database
Lecture Notes Unit3 chapter21 - parallel databases
Manjeet Singh.pptx
Advancedrn
Advanced databases -client /server arch
ADBS_parallel Databases in Advanced DBMS
Distributed query processing for Advance database technology .ppt
chapter21-parallel processing. computing
Adaptive Query Optimization in 12c
Data Stream Management
Analysing and troubleshooting Parallel Execution IT Tage 2015
Ch07.ppt
TASK AND DATA PARALLELISM in Computer Science pptx
Partitioning of Query Processing in Distributed Database System to Improve Th...
Data Stream Management
Parallel Database description in database management
ch22a_ParallelDBs how parallel Datab.ppt
Query optimization

More from Md. Shamsur Rahim (7)

PPTX
Software Quality Assurance & Testing
PPTX
National Operating System for Bangladesh
PPTX
Slide #2: Setup Apache Storm
PPTX
Slide #1:Introduction to Apache Storm
PPTX
Slide #2: How to Setup Apache STROM
PPTX
1 storm-intro
PPTX
NASA Space App Challenge-Team: Hello World
Software Quality Assurance & Testing
National Operating System for Bangladesh
Slide #2: Setup Apache Storm
Slide #1:Introduction to Apache Storm
Slide #2: How to Setup Apache STROM
1 storm-intro
NASA Space App Challenge-Team: Hello World

Recently uploaded (20)

PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Encapsulation_ Review paper, used for researhc scholars
PPT
Teaching material agriculture food technology
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Machine Learning_overview_presentation.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Encapsulation theory and applications.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Big Data Technologies - Introduction.pptx
Electronic commerce courselecture one. Pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Advanced methodologies resolving dimensionality complications for autism neur...
Encapsulation_ Review paper, used for researhc scholars
Teaching material agriculture food technology
Network Security Unit 5.pdf for BCA BBA.
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
MIND Revenue Release Quarter 2 2025 Press Release
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Machine Learning_overview_presentation.pptx
The AUB Centre for AI in Media Proposal.docx
A comparative analysis of optical character recognition models for extracting...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation theory and applications.pdf
Digital-Transformation-Roadmap-for-Companies.pptx

Load Balancing in Parallel and Distributed Database

  • 1. Load Balancing in Distributed Database Md. Shamsur Rahim 14-98181-3 Student, MScCS, AIUB AZM Ehtesham Chowdhury 15-98451-1 Student, MScCS, AIUB Saiful Akhter 15-98502-1 Student, MScCS, AIUB
  • 2. Load Balancing:  Means distributing transaction and queries among different nodes.  The goal is to maximize the throughput.  Parallel Execution Problems  1. Initialization  2. Interference  3. Skew
  • 3. Parallel Execution Problems : Initialization  Initialization is necessary before execution.  This sequential steps includes  Process/ Thread Creation and initialization  Communication Initialization etc.  The duration is proportional to the degree of parallelism  The degree of parallelism should be fixed according to query complexity.  Formula for finding response time for an Operator: 𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒𝑇𝑖𝑚𝑒 = 𝑎 ∗ 𝑛 + 𝑐∗𝑁 𝑛  The equation can be further derived to obtain: 𝑁 = 𝑡𝑜𝑢𝑝𝑙𝑒𝑠, 𝑐 = 𝑎𝑣𝑔 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 n = No. Of Processors optimal number of processors to allocate (n) maximal achievable speedup (S) 𝑛 = √ 𝑐 ∗ 𝑁 𝑎 𝑆 = 𝑛 2
  • 4. Parallel Execution Problems : Interferences  Parallel execution can be slowed down by interference.  Interference occurs when several processors simultaneously access the same resource,  Hardware  Solution: Duplicate Shared resource  Software.  Solution: Partition the shared resource into several independent resources
  • 5. Parallel Execution Problems : Skew  Problem appears with intra- operator parallelism (variation in partition size) is known as data skew.  Classification of Skew:  Attribute Value Skew : inherent in the dataset  e.g., there are more citizens in Paris than in Waterloo  Tuple Placement Skew: introduced when the data are initially partitioned  e.g., with range partitioning  Selectivity Skew  introduced when there is variation in the selectivity of select predicates on each node  Redistribution Skew  occurs in the redistribution step between two operators.  Join Product Skew  occurs because the join selectivity may vary between nodes
  • 6. Inter-Query Parallelism  Form of parallelism where many different Queries or Transactions are executed in parallel with one another on many processors.  Advantages:  Increases Transaction Throughput.  Scales up the Transaction processing system  Easy to implement in Shared Memory Parallel System.  Example: Oracle 8 & Oracle Rdb.
  • 7. Intra-Query Parallelism  Form of parallelism where Single Query is executed in parallel on many processors.  2 Types.  Intra-operation parallelism  Inter-operation parallelism  Advantages:  speed up a single complex long running queries.  Best suited for complex scientific calculations (queries).  Example: Informix, Terradata.
  • 8. Intra-operation parallelism  The process of speeding up a query through parallelizing the execution of individual operations.  The operations which can be parallelized are Sort, Join, Projection, Selection and so on.
  • 9. Inter-operation parallelism  The process of speeding up a query through parallelizing various operations which are part of the query.  Example Step:  A query which involves join of 4 tables executed in two processors  Each processor shall join two relations locally and the result1 and result2 can be joined further to produce the final result.
  • 10. Intra-Operator Load Balancing  Depends on  The degree of parallelism.  Allocation of processors for the operator.  The home of the operator (the set of processors where it is executed) must be carefully decided.  The skew problem makes it hard for a parallel query optimizer to make this decision statically.  Require a very accurate and detailed cost model.
  • 11.  Two Solutions incorporated in a hybrid query optimizer.  Adaptive  Specialized
  • 12. Adaptive Technique  The main idea is to statically decide on an initial allocation of the processors to the operator (using a cost model).  Adapt to skew using load reallocation.  Load reallocation is to detect the oversized partitions.  Partition them again onto several processors.
  • 13. Adaptive Technique(Continued)  Advantage:  More dynamic adjustment of the degree of parallelism.  useful to improve intra-operator load balancing in all kinds of parallel architectures.  By reducing processor interference  Excellent load balancing for intra-operator parallelism
  • 14. Adaptive Technique(Continued)  specific control operators.  Detect whether the static estimates for intermediate result sizes differ from the run-time values.  Relation redistribution in order to prevent join product skew and redistribution skew.  Depends on difference between the estimate and the real value is sufficiently high.
  • 15. Specialized techniques  Two main techniques.  Range partitioning  Sampling  Avoid redistribution skew of the building relation.  Processors can get partitions of equal numbers of tuples, corresponding to different ranges of join attribute values.
  • 16. Specialized techniques(Continued)  To deal with skew as follows:  Sample the building relation to determine the partitioning ranges.  Redistribute the building relation to the processors using the ranges. Each processor builds a hash table containing the incoming tuples.  Redistribute the probing relation using the same ranges to the processors. For each tuple received, each processor probes the hash table to perform the join.
  • 17. Inter-Operator Load Balancing  Important to Choose for each operator  How many and which processors to assign for its execution.  Taking into account pipeline parallelism, which requires inter-operator communication.  Harder to achieve in shared-nothing for this Reasons:  Choice of the degree of parallelism cause to errors  Reason: Both processors and operators are discrete entities.
  • 18. Inter-Operator Load Balancing(Continued)  Processors associated with the latest operators in a pipeline chain may remain idle a significant time.  Shared-memory allows the parallel execution of independent pipeline chains  It is known as Tasks.  Dynamically adjusting the degree of intra-operator parallelism of the tasks in order to reach maximum resource utilization.
  • 19. Activations  Represents a sequential unit of work  Can be executed by any thread  Self-contained  Can only be executed in the same SM(shared memory)-node
  • 20. Activation Queues Moving data activation along pipeline chains Also called table queues Threads have unrestricted access to the same SM-node queues Small number of queue results interference A thread a queue
  • 21. Thread  Simple strategy for good load balancing if number of threads are higher than the processors  One thread per processor per query reduce the overhead of interference  Thread will consume activation as much as possible to limit thread interference
  • 23. Reference:  M. Tamer Özsu • Patrick Valduriez, Principles of Distributed Database Systems, Third Edition