SlideShare a Scribd company logo
2
Most read
3
Most read
6
Most read
1
Parallel
Algorithms
Shashikant V. Athawale
Assistant Professor ,Computer Engineering
Department AISSMS College of Engineering,
Kennedy Road, Pune , MS, India - 411001
Parallel Algorithms
Parallel: perform more than one operation at a time.
PRAM model: Parallel Random Access Model.
2
p0
p1
pn-1
Shared
memory
Multiple processors connected to a shared memory.
Each processor access any location in unit time.
All processors can access memory in parallel.
All processors can perform operations in parallel.
Concurrent vs. Exclusive AccessFour models
EREW: exclusive read and exclusive write
CREW: concurrent read and exclusive write
ERCW: exclusive read and concurrent write
CRCW: concurrent read and concurrent write
Handling write conflicts
Common-write model: only if they write the same
value.
Arbitrary-write model: an arbitrary one succeeds.
Priority-write model: the one with smallest index
succeeds.
EREW and CRCW are most popular.
3
Synchronization and Control
Synchronization:
A most important and complicated issue
Suppose all processors are inherently tightly
synchronized:
 All processors execute the same statements at the same
time
 No race among processors, i.e, same pace.
Termination control of a parallel loop:
Depend on the state of all processors
Can be tested in O(1) time.
4
Pointer Jumping –list ranking
Given a single linked list L with n objects,
compute, for each object in L, its distance from the
end of the list.
Formally: suppose next is the pointer field
d[i]= 0 if next[i]=nil
 d[next[i]]+1 if next[i]≠nil
Serial algorithm: Θ(n).
5
List ranking –EREW algorithm
 LIST-RANK(L) (in O(lg n) time)
1. for each processor i, in parallel
2. do if next[i]=nil
3. then d[i]←0
4. else d[i]←1
5. while there exists an object i such that next[i]≠nil
6. do for each processor i, in parallel
7. do if next[i]≠nil
8. then d[i]← d[i]+ d[next[i]]
9. next[i] ←next[next[i]]
6
7
1
3
1
4
1
6
1
1
1
0
0
5
(a)
3 4 6 1 0 5
(b) 2 2 2 2 1 0
3 4 6 1 0 5
(c) 4 4 3 2 1 0
3 4 6 1 0 5
(d) 5 4 3 2 1 0
List ranking –correctness of EREW algorithm
Loop invariant: for each i, the sum of d values
in the sublist headed by i is the correct
distance from i to the end of the original list L.
Parallel memory must be synchronized: the
reads on the right must occur before the wirtes
on the left. Moreover, read d[i] and then read
d[next[i]].
An EREW algorithm: every read and write is
exclusive. For an object i, its processor reads
d[i], and then its precedent processor reads its
d[i]. Writes are all in distinct locations.
8
LIST ranking EREW algorithm running time
O(lg n):
The initialization for loop runs in O(1).
Each iteration of while loop runs in O(1).
There are exactly lg n iterations:
 Each iteration transforms each list into two interleaved lists:
one consisting of objects in even positions, and the other
odd positions. Thus, each iteration double the number of
lists but halves their lengths.
The termination test in line 5 runs in O(1).
Define work =#processors ×running time. O(n lg n).
9
Parallel prefix on a list
A prefix computation is defined as:
Input: <x1, x2, …, xn>
Binary associative operation ⊗
Output:<y1, y2, …, yn>
Such that:
 y1= x1
 yk= yk-1⊗ xkfork=2,3, …,n, i.e, yk= ⊗ x1⊗ x2 …⊗ xk.
Suppose <x1, x2, …, xn> are stored orderly in a list.
Define notation: [i,j]= xi⊗ xi+1 …⊗ xj
10
Prefix computation LIST-PREFIX(L)
1. for each processor i, in parallel
2. do y[i]← x[i]
3. while there exists an object i such that next[i]≠nil
4. do for each processor i, in parallel
5. do if next[i]≠nil
6. then y[next[i]]← y[i] ⊗ y[next[i]]
7. next[i] ←next[next[i]]
11
12
[1,1]
x1
[2,2]
x2
[3,3] [4,4]
x4
[5,5]
x5
[6,6]
x6
(a)
x3
x4
(b)
x1 x2 x5
x6x3
[1,1] [1,2] [2,3] [3,4] [4,5] [5,6]
x1 x2 x5
x6x3
x1 x2 x5
x6x3
(c)
(d)
[1,1] [1,2] [1,3] [1,4] [2,5] [3,6]
[1,1] [1,2] [1,3] [1,4] [1,5] [1,6]
Find root –CREW algorithm
Suppose a forest of binary trees, each node i has a
pointer parent[i].
Find the identity of the tree of each node.
Assume that each node is associated a processor.
Assume that each node i has a field root[i].
13
Find-roots –CREW algorithm
 FIND-ROOTS(F)
1. for each processor i, in parallel
2. do if parent[i] = nil
3. then root[i]←i
4. while there exist a node i such that parent[i] ≠ nil
5. do for each processor i, in parallel
6. do if parent[i] ≠ nil
7. then root[i] ← root[parent[i]]
8. parent[i] ← parent[parent[i]]
14
Find root –CREW algorithm
Running time: O(lg d), where d is the height of
maximum-depth tree in the forest.
All the writes are exclusive
But the read in line 7 is concurrent, since several
nodes may have same node as parent.
See figure 30.5.
15
16
Find roots –CREW vs. EREW
How fast can n nodes in a forest determine their
roots using only exclusive read?
17
Ω(lg n)
Argument: when exclusive read, a given peace of information can only be
copied to one other memory location in each step, thus the number of locations
containing a given piece of information at most doubles at each step. Looking
at a forest with one tree of n nodes, the root identity is stored in one place initially.
After the first step, it is stored in at most two places; after the second step, it is
Stored in at most four places, …, so need lg n steps for it to be stored at n places.
So CREW: O(lg d) and EREW: Ω(lg n).
If d=2(lg n)
, CREW outperforms any EREW algorithm.
If d=Θ(lg n), then CREW runs in O(lg lg n), and EREW is
much slower.
Find maximum – CRCW algorithm Given n elements A[0,n-1], find the maximum.
 Suppose n2
processors, each processor (i,j) compare A[i] and A[j], for 0≤
i, j ≤n-1.
 FAST-MAX(A)
1. n←length[A]
2. for i ←0 to n-1, in parallel
3. do m[i] ←true
4. for i ←0 to n-1 and j ←0 to n-1, in parallel
5. do if A[i] < A[j]
6. then m[i] ←false
7. for i ←0 to n-1, in parallel
8. do if m[i] =true
9. then max ← A[i]
10. return max
18
The running time is O(1).
Note: there may be multiple maximum values, so their processors
Will write to max concurrently. Its work = n2
× O(1) =O(n2
).
5 6 9 2 9 m
5 F T T F T F
6 F F T F T F
9 F F F F F T
2 T T T F T F
9 F F F F F T
A[j]
A[i]
max=9
Find maximum –CRCW vs. EREW
If find maximum using EREW, then Ω(lg n).
Argument: consider how many elements “think”
that they might be the maximum.
First, n,
After first step, n/2,
After second step n/4. …, each step, halve.
Moreover, CREW takes Ω(lg n).
19
Stimulating CRCW with EREW
Theorem:
A p-processor CRCW algorithm can be no more than O(lg p)
times faster than a best p-processor EREW algorithm for the same
problem.
Proof: each step of CRCW can be simulated by O(lg p)
computations of EREW.
Suppose concurrent write:
 CRCW pi write data xi to location li, (li may be same for multiple pi ‘s).
 Corresponding EREW pi write (li, xi) to a location A[i], (different A[i]’s)
so exclusive write.
 Sort all (li, xi)’s by li’s, same locations are brought together. in O(lg p).
 Each EREW picompares A[i]= (lj, xj), and A[i-1]= (lk, xk). If lj≠ lk or i=0,
then EREW pi writes xj to lj. (exclusive write).
See figure 30.7.
20
21
CRCW vs. EREW
CRCW:
Some says: easier to program and more faster.
Others say: The hardware to CRCW is slower than
EREW. And One can not find maximum in O(1).
Still others say: either EREW or CRCW is wrong.
Processors must be connected by a network, and only
be able to communicate with other via the network, so
network should be part of the model.
22
 Thank You
23

More Related Content

PPT
Parallel processing
PPTX
Code optimization
PPT
Mutual exclusion and sync
PPTX
Translation of expression(copmiler construction)
PPTX
Introduction to Parallel and Distributed Computing
PPTX
Address mapping
PPTX
PPTX
Daa unit 1
Parallel processing
Code optimization
Mutual exclusion and sync
Translation of expression(copmiler construction)
Introduction to Parallel and Distributed Computing
Address mapping
Daa unit 1

What's hot (20)

PPTX
Presentation on Segmentation
PDF
Lecture 1 introduction to parallel and distributed computing
PPTX
Matrix chain multiplication
PPTX
Dynamic storage allocation techniques
PPTX
Segmentation in operating systems
PPTX
Page replacement algorithms
PPTX
Dag representation of basic blocks
PPTX
Associative memory 14208
DOC
operating system lecture notes
PDF
Process scheduling (CPU Scheduling)
PPT
Map Reduce
PPT
Flynns classification
PPTX
8 queens problem using back tracking
PDF
Introduction to Parallel Computing
PPT
advanced computer architesture-conditions of parallelism
DOCX
Multiversion Concurrency Control Techniques
PPTX
Principal Sources of Optimization in compiler design
PDF
Storage organization and stack allocation of space
DOCX
Implementation of absolute loader
PPTX
Learning set of rules
Presentation on Segmentation
Lecture 1 introduction to parallel and distributed computing
Matrix chain multiplication
Dynamic storage allocation techniques
Segmentation in operating systems
Page replacement algorithms
Dag representation of basic blocks
Associative memory 14208
operating system lecture notes
Process scheduling (CPU Scheduling)
Map Reduce
Flynns classification
8 queens problem using back tracking
Introduction to Parallel Computing
advanced computer architesture-conditions of parallelism
Multiversion Concurrency Control Techniques
Principal Sources of Optimization in compiler design
Storage organization and stack allocation of space
Implementation of absolute loader
Learning set of rules
Ad

Similar to Parallel algorithms (20)

RTF
algorithm unit 1
RTF
Design and Analysis of algorithms
PPT
multi threaded and distributed algorithms
PPT
Stacksqueueslists
PPT
Stacks queues lists
PPT
Stacks queues lists
PPT
Stacks queues lists
PPT
Stack squeues lists
PPT
Stacks queues lists
PPT
Chapter 4 ds
DOCX
CS3401- Algorithmto use for data structure.docx
PDF
Parallel search
PDF
An evolutionary method for constructing complex SVM kernels
PDF
Anu DAA i1t unit
PDF
Fibonacci Function Gallery - Part 2 - One in a series
PDF
Pydiomatic
PDF
Python idiomatico
PPT
Review session2
algorithm unit 1
Design and Analysis of algorithms
multi threaded and distributed algorithms
Stacksqueueslists
Stacks queues lists
Stacks queues lists
Stacks queues lists
Stack squeues lists
Stacks queues lists
Chapter 4 ds
CS3401- Algorithmto use for data structure.docx
Parallel search
An evolutionary method for constructing complex SVM kernels
Anu DAA i1t unit
Fibonacci Function Gallery - Part 2 - One in a series
Pydiomatic
Python idiomatico
Review session2
Ad

More from Dr Shashikant Athawale (20)

PPT
Amortized analysis
PPT
Complexity theory
PPT
Divide and Conquer
PPT
Model and Design
PPT
Fundamental of Algorithms
PPT
CUDA Architecture
PPT
Parallel Algorithms- Sorting and Graph
PPT
Analytical Models of Parallel Programs
PPT
Basic Communication
PPT
Parallel Processing Concepts
PPT
Parallel Processing Concepts
PPT
Dynamic programming
PPT
Greedy method
PPT
Divide and conquer
PPT
Branch and bound
PPT
Asymptotic notation
PPT
String matching algorithms
PPTX
Advanced Wireless Technologies
PPTX
Vehicular network
Amortized analysis
Complexity theory
Divide and Conquer
Model and Design
Fundamental of Algorithms
CUDA Architecture
Parallel Algorithms- Sorting and Graph
Analytical Models of Parallel Programs
Basic Communication
Parallel Processing Concepts
Parallel Processing Concepts
Dynamic programming
Greedy method
Divide and conquer
Branch and bound
Asymptotic notation
String matching algorithms
Advanced Wireless Technologies
Vehicular network

Recently uploaded (20)

PPTX
Sustainable Sites - Green Building Construction
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
additive manufacturing of ss316l using mig welding
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Geodesy 1.pptx...............................................
PDF
Well-logging-methods_new................
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PPTX
Artificial Intelligence
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
Current and future trends in Computer Vision.pptx
PPTX
Construction Project Organization Group 2.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
bas. eng. economics group 4 presentation 1.pptx
Sustainable Sites - Green Building Construction
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
additive manufacturing of ss316l using mig welding
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Fundamentals of safety and accident prevention -final (1).pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
CH1 Production IntroductoryConcepts.pptx
Geodesy 1.pptx...............................................
Well-logging-methods_new................
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
Artificial Intelligence
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Automation-in-Manufacturing-Chapter-Introduction.pdf
Current and future trends in Computer Vision.pptx
Construction Project Organization Group 2.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Operating System & Kernel Study Guide-1 - converted.pdf
bas. eng. economics group 4 presentation 1.pptx

Parallel algorithms

  • 1. 1 Parallel Algorithms Shashikant V. Athawale Assistant Professor ,Computer Engineering Department AISSMS College of Engineering, Kennedy Road, Pune , MS, India - 411001
  • 2. Parallel Algorithms Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. 2 p0 p1 pn-1 Shared memory Multiple processors connected to a shared memory. Each processor access any location in unit time. All processors can access memory in parallel. All processors can perform operations in parallel.
  • 3. Concurrent vs. Exclusive AccessFour models EREW: exclusive read and exclusive write CREW: concurrent read and exclusive write ERCW: exclusive read and concurrent write CRCW: concurrent read and concurrent write Handling write conflicts Common-write model: only if they write the same value. Arbitrary-write model: an arbitrary one succeeds. Priority-write model: the one with smallest index succeeds. EREW and CRCW are most popular. 3
  • 4. Synchronization and Control Synchronization: A most important and complicated issue Suppose all processors are inherently tightly synchronized:  All processors execute the same statements at the same time  No race among processors, i.e, same pace. Termination control of a parallel loop: Depend on the state of all processors Can be tested in O(1) time. 4
  • 5. Pointer Jumping –list ranking Given a single linked list L with n objects, compute, for each object in L, its distance from the end of the list. Formally: suppose next is the pointer field d[i]= 0 if next[i]=nil  d[next[i]]+1 if next[i]≠nil Serial algorithm: Θ(n). 5
  • 6. List ranking –EREW algorithm  LIST-RANK(L) (in O(lg n) time) 1. for each processor i, in parallel 2. do if next[i]=nil 3. then d[i]←0 4. else d[i]←1 5. while there exists an object i such that next[i]≠nil 6. do for each processor i, in parallel 7. do if next[i]≠nil 8. then d[i]← d[i]+ d[next[i]] 9. next[i] ←next[next[i]] 6
  • 7. 7 1 3 1 4 1 6 1 1 1 0 0 5 (a) 3 4 6 1 0 5 (b) 2 2 2 2 1 0 3 4 6 1 0 5 (c) 4 4 3 2 1 0 3 4 6 1 0 5 (d) 5 4 3 2 1 0
  • 8. List ranking –correctness of EREW algorithm Loop invariant: for each i, the sum of d values in the sublist headed by i is the correct distance from i to the end of the original list L. Parallel memory must be synchronized: the reads on the right must occur before the wirtes on the left. Moreover, read d[i] and then read d[next[i]]. An EREW algorithm: every read and write is exclusive. For an object i, its processor reads d[i], and then its precedent processor reads its d[i]. Writes are all in distinct locations. 8
  • 9. LIST ranking EREW algorithm running time O(lg n): The initialization for loop runs in O(1). Each iteration of while loop runs in O(1). There are exactly lg n iterations:  Each iteration transforms each list into two interleaved lists: one consisting of objects in even positions, and the other odd positions. Thus, each iteration double the number of lists but halves their lengths. The termination test in line 5 runs in O(1). Define work =#processors ×running time. O(n lg n). 9
  • 10. Parallel prefix on a list A prefix computation is defined as: Input: <x1, x2, …, xn> Binary associative operation ⊗ Output:<y1, y2, …, yn> Such that:  y1= x1  yk= yk-1⊗ xkfork=2,3, …,n, i.e, yk= ⊗ x1⊗ x2 …⊗ xk. Suppose <x1, x2, …, xn> are stored orderly in a list. Define notation: [i,j]= xi⊗ xi+1 …⊗ xj 10
  • 11. Prefix computation LIST-PREFIX(L) 1. for each processor i, in parallel 2. do y[i]← x[i] 3. while there exists an object i such that next[i]≠nil 4. do for each processor i, in parallel 5. do if next[i]≠nil 6. then y[next[i]]← y[i] ⊗ y[next[i]] 7. next[i] ←next[next[i]] 11
  • 12. 12 [1,1] x1 [2,2] x2 [3,3] [4,4] x4 [5,5] x5 [6,6] x6 (a) x3 x4 (b) x1 x2 x5 x6x3 [1,1] [1,2] [2,3] [3,4] [4,5] [5,6] x1 x2 x5 x6x3 x1 x2 x5 x6x3 (c) (d) [1,1] [1,2] [1,3] [1,4] [2,5] [3,6] [1,1] [1,2] [1,3] [1,4] [1,5] [1,6]
  • 13. Find root –CREW algorithm Suppose a forest of binary trees, each node i has a pointer parent[i]. Find the identity of the tree of each node. Assume that each node is associated a processor. Assume that each node i has a field root[i]. 13
  • 14. Find-roots –CREW algorithm  FIND-ROOTS(F) 1. for each processor i, in parallel 2. do if parent[i] = nil 3. then root[i]←i 4. while there exist a node i such that parent[i] ≠ nil 5. do for each processor i, in parallel 6. do if parent[i] ≠ nil 7. then root[i] ← root[parent[i]] 8. parent[i] ← parent[parent[i]] 14
  • 15. Find root –CREW algorithm Running time: O(lg d), where d is the height of maximum-depth tree in the forest. All the writes are exclusive But the read in line 7 is concurrent, since several nodes may have same node as parent. See figure 30.5. 15
  • 16. 16
  • 17. Find roots –CREW vs. EREW How fast can n nodes in a forest determine their roots using only exclusive read? 17 Ω(lg n) Argument: when exclusive read, a given peace of information can only be copied to one other memory location in each step, thus the number of locations containing a given piece of information at most doubles at each step. Looking at a forest with one tree of n nodes, the root identity is stored in one place initially. After the first step, it is stored in at most two places; after the second step, it is Stored in at most four places, …, so need lg n steps for it to be stored at n places. So CREW: O(lg d) and EREW: Ω(lg n). If d=2(lg n) , CREW outperforms any EREW algorithm. If d=Θ(lg n), then CREW runs in O(lg lg n), and EREW is much slower.
  • 18. Find maximum – CRCW algorithm Given n elements A[0,n-1], find the maximum.  Suppose n2 processors, each processor (i,j) compare A[i] and A[j], for 0≤ i, j ≤n-1.  FAST-MAX(A) 1. n←length[A] 2. for i ←0 to n-1, in parallel 3. do m[i] ←true 4. for i ←0 to n-1 and j ←0 to n-1, in parallel 5. do if A[i] < A[j] 6. then m[i] ←false 7. for i ←0 to n-1, in parallel 8. do if m[i] =true 9. then max ← A[i] 10. return max 18 The running time is O(1). Note: there may be multiple maximum values, so their processors Will write to max concurrently. Its work = n2 × O(1) =O(n2 ). 5 6 9 2 9 m 5 F T T F T F 6 F F T F T F 9 F F F F F T 2 T T T F T F 9 F F F F F T A[j] A[i] max=9
  • 19. Find maximum –CRCW vs. EREW If find maximum using EREW, then Ω(lg n). Argument: consider how many elements “think” that they might be the maximum. First, n, After first step, n/2, After second step n/4. …, each step, halve. Moreover, CREW takes Ω(lg n). 19
  • 20. Stimulating CRCW with EREW Theorem: A p-processor CRCW algorithm can be no more than O(lg p) times faster than a best p-processor EREW algorithm for the same problem. Proof: each step of CRCW can be simulated by O(lg p) computations of EREW. Suppose concurrent write:  CRCW pi write data xi to location li, (li may be same for multiple pi ‘s).  Corresponding EREW pi write (li, xi) to a location A[i], (different A[i]’s) so exclusive write.  Sort all (li, xi)’s by li’s, same locations are brought together. in O(lg p).  Each EREW picompares A[i]= (lj, xj), and A[i-1]= (lk, xk). If lj≠ lk or i=0, then EREW pi writes xj to lj. (exclusive write). See figure 30.7. 20
  • 21. 21
  • 22. CRCW vs. EREW CRCW: Some says: easier to program and more faster. Others say: The hardware to CRCW is slower than EREW. And One can not find maximum in O(1). Still others say: either EREW or CRCW is wrong. Processors must be connected by a network, and only be able to communicate with other via the network, so network should be part of the model. 22