SlideShare a Scribd company logo
Parallel Sorting
Algorithms
2
Sorting Algorithms
• Given an array of elements with size , and a total order
defined, return an array of the same elements in , with:
• Sometimes in-place algorithms are preferred
3
Sorting Algorithms
• Given an array of elements with size , and a total order
defined, return an array of the same elements in , with:
• Sometimes in-place algorithms are preferred
• How can we sort them sequentially?
• Lower bound in the comparison model: comparisons needed
• Using radix-based sorting (e.g., assume all of them are integers): time
needed for certain cases
4
Sequential sorting algorithms: some simple
ones
• Selection sort
• Find the smallest element and put it in the first slot, then for the rest,
find the smallest and put it in the second, …
• Bubble sort:
• Compare all adjacent elements and , and if is greater, swap them
5
Sequenti
al sorting
algorith
ms:
mergesor
t
void mergesort(int *B, int *A, int n) {
if (n==1) B[0] = A[0]; else {
int C[n];
mergesort(C, A, n/2);
mergesort(C+n/2, A+n/2, n-n/2);
B = merge(C, n/2, C+n/2, n-n/2); }}
5 7 9 12 2 10 16 26
2 5 7 9 10 12 16 26
Merge
Merge
Merge
5 12 7 9 26 10 2 16
5 12 7 9 10 26 2 16
5 12 7 9 26 10 2 16
5 12 7 9 26 10 2 16
5 12 7 9 10 26 2 16
Divide-and-conquer
Divide-and-conquer
Divide-and-conquer
Base cases
How to
merge?
6
Sequential sorting algorithms: mergesort
• Split the array evenly in two.
• Sort each of them recursively
• Merge them back – how?
0 4 7 8
1 2 3 5 6 9
0 1 2 3 4 5 6 7 8 9
• Costs time to merge two arrays of total size
merge(A, na, B, nb) {
p1 = 0; p2 = 0; p3 = 0;
while ((p1 < na) && (p2< nb)) {
if (A[p1]<B[p2]) {
C[p3] = A[p1]; p1++
} else {
C[p3] = B[p2]; p2++;
} }
//copy the rest of the unfinished
array
return C;
}
7
Sequential sorting algorithms: mergesort
• rounds
• Each round costs time
• Total Cost:
Work:
From Master Theorem
8
Sequential sorting algorithms: quicksort
• Find a random pivot in the array (e.g., the middle one)
• Put all elements in that are smaller than on the left, and all
elements in that are greater than on the right
6
2 9 4
3 1 5 8 7 0
2
4 1
3 5 0 6 8 9 7
6
2
1 0 2 4 3 5 6
7 8 9
1
0 1 2
3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
3
• How to move elements around?
8
4 7 8
4
9
Sequential sorting algorithms: quicksort
• How to move elements
around? (using 6 as a
pivot)
6 2 9 4 1 3 5 8 7 0
0 2 9 4 1 3 5 8 7 6
Partition(A, n, x) {
i = 0; j = n-1;
while (i < j) {
while (A[i] < x) i++;
while (A[j] > x) j++;
if (i < j) swap A[i] and
A[j];
i++; j--;
}
}
0 2 9 4 1 3 5 8 7 6
0 2 5 4 1 3 9 8 7 6
0 2 5 4 1 3 9 8 7 6
• time for one round
Quicksort cost analysis
• Pivot is chosen uniformly at random
• 1/2 chance that pivot falls in middle range, in which case sub-problem
size is at most 3n/4
• Expected #rounds: (also w.h.p., with high probability)
∙ w.h.p. means that
∙ E.g., the probability that a quicksort doesn’t finish in rounds is no more than
• Each round need time (partition)
• In total time
n/2 keys n/4 keys
n/4 keys
Keys in order
11
Sequential sorting algorithms
• Quicksort is usually “quicker” than mergesort
• Merge sort need additional space, and quicksort is in-place
• Each recursive call in quicksort is dealing with a consecutive chunk in the
input – better cache locality
• What about in parallel?
12
Parallel Quicksort
13
Sequential quicksort
• Use a pivot and partition
the array into two parts
• Sort each of them
recursively
• Use a pivot and partition
the array into two parts
• Sort each of them
recursively, in parallel
Parallel quicksort
14
Parallel quick sort
• The partitioning algorithm costs time. So even if the
problem is always perfectly partitioned
• ?
• Have to partition in parallel!
• How to move things around?
• Need to pack all elements smaller than the pivot and all elements larger
than the pivot
• The filter algorithm!
15
Parallel filtering / packing
• Given an array of elements and a predicate function , output
an array with elements in that satisfy
4 2 9 3 6 5 7 11 10 8
9 3 5 7 11
𝑓 (𝑥)={ 𝑡𝑟𝑢𝑒𝑖𝑓 𝑥𝑖𝑠𝑜𝑑𝑑
𝑓𝑎𝑙𝑠𝑒𝑖𝑓 𝑥𝑖𝑠 𝑒𝑣𝑒𝑛
𝐴=¿
𝐵=¿
16
Using filter for partition
• How to move elements around? The filter algorithm!
6 2 9 4 1 3 5 8 7 0
0 1 0 1 1 1 1 0 0 1
filter(A, flag, n) {
ps = scan(flag);
parallel_for(i=1 to n)
{
if (ps[i]!=ps[i-1])
B[ps[i]] = A[i];
}
}
• time for one round
A
flag
X 2 X 4 1 3 5 X X 0
0 1 0 1 1 1 1 0 0 1
A
flag
0 1 1 2 3 4 5 5 5 6
Prefix sum
of flag
using 6 as a pivot
2 4 1 3 5 0
pack
17
Using filter to partition
• To get all elements smaller than the pivot and all elements
larger than the pivot
• We can run two separate filters
• Two rounds of I/O and global data movement
• Parallel partition
• After doing the first scan, we know the result of the second scan!
18
Using filter for partition
6 2 9 4 1 3 5 8 7 0
0 1 0 1 1 1 1 0 0 1
A
flag
0 1 1 2 3 4 5 5 5 6
scan1
using 6 as a pivot
2 4 1 3 5 0
pack
1 1 2 2 2 2 2 3 4 4
scan2
scan1[]: the prefix sum of 1s
scan2[]: the prefix sum of 0s
=> scan1[i] + scan2[i] = i
=> scan2[i] = i - scan1[i]
6 9 8 7
19
Parallel quicksort
• Using the filter algorithm to do partition
• Finishes in rounds in expectation (also w.h.p.)
• Each round need work and depth
• work and depth in total
20
Parallel Merge sort
21
Sequential merge sort
• Split the array evenly in two
• Sort each of them
recursively
• Merge them back
• Split the problem size evenly
in two
• Sort each of them
recursively, in parallel
• Merge them back
Parallel merge sort
22
Parallel merge sort
• The merging algorithm costs time. So
• ?
• Have to merge in parallel!
23
A parallel merge
algorithm
• Find the median of one array
• Binary search it in the other
array
• Put in the correct slot
• Recursively, in parallel do:
• Merge the left two sub-arrays into
the left half of the output
• Merge the right ones into the
right half of the output
9
3 4 6
2
0 1 5 7 8
4
1 2 3
0 9
6 7 8
5
Binary search
3
2
0 1
9
6
5 7 8
Subproblem 1:
Merge 2,3 with 0,1
Subproblem 2:
Merge 6,9 with 5,7,8
24
A parallel merge
algorithm
9
3 4 6
2
0 1 5 7 8
4
1 2 3
0 9
6 7 8
5
Binary search
3
2
0 1
9
6
5 7 8
Subproblem 1:
Merge 2,3 with 0,1
Subproblem 2:
Merge 6,9 with 5,7,8
//merge array A of length n1 and array B of
length n2 into array C.
Merge(A’, n1, B’, n2, C) {
if (A’ is empty or B’ is empty) base_case;
m = n1/2;
m2 = binary_search(B’, A’[m]);
C[m+m2+1] = A’[m];
in parallel:
merge(A’, m, B’, m2, C);
merge(A’+m+1, n1-m-1, B’+m2+1, n2-m2-1,
C+m+m2);
return C;
}
25
A parallel merge
algorithm
• In each recursive call the only work
is the binary search
• Assume the original input arrays
are and . They are both of the same
size .
• Assume in each recursive call, we
are dealing with and , they can
have different sizes.
• Array from is always perfectly
partitioned, but it’s not the case for
array . But, as long as is empty, we
reach the base case.
• So in log n rounds we reach the
base case. In each round the cost is
also O(log n)
9
3 4 6
2
0 1 5 7 8
4
1 2 3
0 9
6 7 8
5
Binary search
3
2
0 1
9
6
5 7 8
Subproblem 1:
Merge 2,3 with 0,1
Subproblem 2:
Merge 6,9 with 5,7,8
𝐷 ( 𝑁 )=𝑂(log2
𝑁)
26
Parallel Merge: work
• Round 1: 1 element in searches
in , takes time
• Round 2: 2 elements in search
in , takes time . .
• Round 3: 4 elements in search
in , takes time . .
• Round 4: 8 elements in search
in , takes time . .
Concavity of log:
𝑓 (𝑥+ 𝑦
2 )
𝑓 ( 𝑥)+ 𝑓 ( 𝑦)
2
𝑓 ( 𝑥)+ 𝑓 ( 𝑦)
2
≤ 𝑓 (𝑥+ 𝑦
2 )
More generally:
Sum of logs is no more than times the
log of their average value
The average of logs is no more than log
of the average of the input variables
𝑾 (𝒏)=𝑾 (𝒏𝟏 )+𝑾 (𝒏 −𝒏𝟏)+𝐥𝐨𝐠 𝒏
27
Parallel Merge: work
• Round 1: 1 element in searches in , takes time
• Round 2: 2 elements in search in
• Round 3: 4 elements in search in .
• Round 4: 8 elements in search in , takes time . .
This is leaf-dominated
𝑾 (𝒏)=𝑾 (𝒏𝟏 )+𝑾 (𝒏 −𝒏𝟏)+𝐥𝐨𝐠 𝒏
Use Master Theorem
28
Parallel merge sort
• Parallel merge: work and depth
• Can be easily reduced to depth – your homework
• Finishes in rounds
• Total work: , depth:
• Can be easily reduced to depth – your homework
29
Parallel sorting algorithms
• Quicksort
• work, depth
• Mergesort
• //can be reduced to depth with a simple variant, you’ll see it in your
homework
• Quicksort is not “quick” any more
• Need additional space for filtering/packing
• Better depth bound?
30
Parallel sorting - work and depth
• For an array of size , there are pairs of elements
• Compare all of them gives us all information needed (there are
redundant information, but let’s just store all of them)
• The comparisons tell us
• For a certain element , the relevant comparisons tell us how many elements are
smaller than it
• That can be computed by a parallel reduce
• That is the rank of !
• Directly write to the -th location in the output
• The work is since we need to compare all the pairs
• The depth is because of the reduce algorithm
• This algorithm actually parallelizes the selection sort
31
List Ranking
32
Linked Lists
• Linked lists are simple and important data structures
• Sometimes we have a tree of nodes with pointers indicating
their parents
• We want to know the rank of each node (e.g., the distance to
the head/tail)
List Ranking
Source: “Parallel Algorithms” by Guy E. Blelloch and Bruce M.
• Input array P, P[i]=j means that
the i-th element’s parent is the j-
th element
• In practice the input can be a
linked list with next/parent
pointers
• Follow the pointers until
reaching the root
Work-Efficient List Ranking
ListRanking(list P)
1. If list has two or fewer nodes, then return //base case
2. Every node flips a fair coin
3. For each vertex u (except the last vertex), if u flipped Tails and P[u] flipped
Heads then u will be paired with P[u]
A. rank(u) = rank(u)+rank(P[u])
B. P[u] = P[P[u]]
4. Recursively call ListRanking on smaller list
5. Insert contracted nodes v back into list with rank(v) = rank(v) + rank(P[v])
1 1 1 1 0
1
T H H T H T
2 1 2 0
Source: MIT 6.886 by Julian Shun
Remove an element if:
It is head
Its previous element is a tail
Idea: reduce the problem size by a constant factor
per round, and apply the algorithm recursively
Work-Efficient List Ranking
1 1 1 1 0
1
T H H T H T
2 1 2 0
Apply recursively
5 3 2 0
Contract + packing
Expand
5 3 2 1 0
4
Source: MIT 6.886 by Julian Shun
36
Work-Depth Analysis
• Number of pairs per round is reduced by (n-1)/4 in
expectation
• For all nodes u except for the last node, probability of u flipping Head
and its previous element flipping Tails is 1/4
• => A node gets removed with probability 1/4
• Each round takes linear work and O(log n) depth
• Expected work: W(n) W(3n/4) + O(n)
≤
• Expected depth: D(n) D(3n/4) + O(log n)
≤
W = O(n)
D = O(log2
n)
in arbitrary-forking
Source: MIT 6.886 by Julian Shun
Randomization is our good friend!!!
Remove an element if:
It is head
Its previous element is a tail
37
Parallel sorting algorithms
• Quicksort
• work, depth
• Mergesort
• //can be reduced to depth with a simple variant, you’ll see it in your
homework
• Selection sort
• In parallel algorithm design, it is likely that to get better
depth, you need to pay more work – there is a tradeoff
• Choose the best algorithm depending on your application
38
Parallel sorting algorithms
• Quicksort
• work, depth
• Mergesort
• //can be reduced to depth with a simple variant, you’ll see it in your
homework
• Selection sort
• Usually, parallel sample sort has the best performance in
practice – we’ll cover that in the lectures about I/O efficiency
• The techniques in quicksort and mergesort are useful for samplesort
also
39
Course Project
• If you plan to work on sorting algorithms – you can start
thinking
• Generally, what you can consider
• Evaluate and compare several existing algorithms
• Test the influence of some factors to the performance of one/several
algorithms
• E.g., different environments/compilers/compilation settings/schedulers/input
instances (input distribution, graph structure, etc.)/machines/#of cores/cache
sizes/…
• Implement a fairly complicated algorithm we learnt in class
• Write down how you make it correct/improve its performance step by step
• Propose optimizations for existing algorithms
• Propose a new algorithm
40
Course Project – what to do
• Motivation/intro/background
• What is the definition of the problem? What are the applications? How do existing
solutions solve it? What is the high-level idea/intuition of your project?
• Methodology
• Pseudocode of the algorithm you implemented? What optimizations do you use
and why you think that would help? Is there any theoretical guarantee of your
methodology?
• Experiments
• Evaluate different settings
• Compare with others’ implementations
• Did your optimization work? If so, how much does it help? If not, do you have a
theory why?
• Conclusion
• What does the result tell you? What did you learn from the project? Are there
things that you tried but didn’t work? Are there any potential future works?
Ad

Recommended

MergesortQuickSort.ppt
MergesortQuickSort.ppt
AliAhmad38278
 
presentation_mergesortquicksort_1458716068_193111.ppt
presentation_mergesortquicksort_1458716068_193111.ppt
ajiths82
 
Lecture23
Lecture23
Dr Sandeep Kumar Poonia
 
UNIT V Searching Sorting Hashing Techniques [Autosaved].pptx
UNIT V Searching Sorting Hashing Techniques [Autosaved].pptx
VISWANATHAN R V
 
UNIT V Searching Sorting Hashing Techniques [Autosaved].pptx
UNIT V Searching Sorting Hashing Techniques [Autosaved].pptx
kncetaruna
 
Parallel Algorithms
Parallel Algorithms
Dr Sandeep Kumar Poonia
 
free power point ready to download right now
free power point ready to download right now
waroc73256
 
Merge sort and quick sort
Merge sort and quick sort
Shakila Mahjabin
 
Data Structure and algorithms for software
Data Structure and algorithms for software
ManishShukla712917
 
Merge sort analysis and its real time applications
Merge sort analysis and its real time applications
yazad dumasia
 
SORT AND SEARCH ARRAY WITH WITH C++.pptx
SORT AND SEARCH ARRAY WITH WITH C++.pptx
narifmsit18seecs
 
Chapter 8 advanced sorting and hashing for print
Chapter 8 advanced sorting and hashing for print
Abdii Rashid
 
Advanced s and s algorithm.ppt
Advanced s and s algorithm.ppt
LegesseSamuel
 
03_sorting and it's types with example .ppt
03_sorting and it's types with example .ppt
vanshii9976
 
03_sorting123456789454545454545444543.ppt
03_sorting123456789454545454545444543.ppt
ssuser7b9bda1
 
02_Gffdvxvvxzxzczcczzczcczczczxvxvxvds2.ppt
02_Gffdvxvvxzxzczcczzczcczczczxvxvxvds2.ppt
DarioVelo1
 
Algorithms and Data structures: Merge Sort
Algorithms and Data structures: Merge Sort
pharmaci
 
Quick Sort , Merge Sort , Heap Sort
Quick Sort , Merge Sort , Heap Sort
Mohammed Hussein
 
quick and merge.pptx
quick and merge.pptx
LakshayYadav46
 
Chapter 1 - Introduction to Searching and Sorting Algorithms - Student.pdf
Chapter 1 - Introduction to Searching and Sorting Algorithms - Student.pdf
mylinhbangus
 
Chapter-2.pptx
Chapter-2.pptx
selemonGamo
 
Quicksort
Quicksort
Gayathri Gaayu
 
Data analysis and algorithm analysis presentation
Data analysis and algorithm analysis presentation
ShafiEsa1
 
module2_dIVIDEncONQUER_2022.pdf
module2_dIVIDEncONQUER_2022.pdf
Shiwani Gupta
 
Sorting pnk
Sorting pnk
pinakspatel
 
Data structure using c module 3
Data structure using c module 3
smruti sarangi
 
sorting-160810203705.pptx
sorting-160810203705.pptx
VarchasvaTiwari2
 
Chapter 8 Sorting in the context of DSA.pptx
Chapter 8 Sorting in the context of DSA.pptx
Dibyesh1
 
Reimagining Software Development and DevOps with Agentic AI
Reimagining Software Development and DevOps with Agentic AI
Maxim Salnikov
 
OpenChain Webinar - AboutCode - Practical Compliance in One Stack – Licensing...
OpenChain Webinar - AboutCode - Practical Compliance in One Stack – Licensing...
Shane Coughlan
 

More Related Content

Similar to Parallel Sorting Algorithms. Quicksort. Merge sort. List Ranking (20)

Data Structure and algorithms for software
Data Structure and algorithms for software
ManishShukla712917
 
Merge sort analysis and its real time applications
Merge sort analysis and its real time applications
yazad dumasia
 
SORT AND SEARCH ARRAY WITH WITH C++.pptx
SORT AND SEARCH ARRAY WITH WITH C++.pptx
narifmsit18seecs
 
Chapter 8 advanced sorting and hashing for print
Chapter 8 advanced sorting and hashing for print
Abdii Rashid
 
Advanced s and s algorithm.ppt
Advanced s and s algorithm.ppt
LegesseSamuel
 
03_sorting and it's types with example .ppt
03_sorting and it's types with example .ppt
vanshii9976
 
03_sorting123456789454545454545444543.ppt
03_sorting123456789454545454545444543.ppt
ssuser7b9bda1
 
02_Gffdvxvvxzxzczcczzczcczczczxvxvxvds2.ppt
02_Gffdvxvvxzxzczcczzczcczczczxvxvxvds2.ppt
DarioVelo1
 
Algorithms and Data structures: Merge Sort
Algorithms and Data structures: Merge Sort
pharmaci
 
Quick Sort , Merge Sort , Heap Sort
Quick Sort , Merge Sort , Heap Sort
Mohammed Hussein
 
quick and merge.pptx
quick and merge.pptx
LakshayYadav46
 
Chapter 1 - Introduction to Searching and Sorting Algorithms - Student.pdf
Chapter 1 - Introduction to Searching and Sorting Algorithms - Student.pdf
mylinhbangus
 
Chapter-2.pptx
Chapter-2.pptx
selemonGamo
 
Quicksort
Quicksort
Gayathri Gaayu
 
Data analysis and algorithm analysis presentation
Data analysis and algorithm analysis presentation
ShafiEsa1
 
module2_dIVIDEncONQUER_2022.pdf
module2_dIVIDEncONQUER_2022.pdf
Shiwani Gupta
 
Sorting pnk
Sorting pnk
pinakspatel
 
Data structure using c module 3
Data structure using c module 3
smruti sarangi
 
sorting-160810203705.pptx
sorting-160810203705.pptx
VarchasvaTiwari2
 
Chapter 8 Sorting in the context of DSA.pptx
Chapter 8 Sorting in the context of DSA.pptx
Dibyesh1
 
Data Structure and algorithms for software
Data Structure and algorithms for software
ManishShukla712917
 
Merge sort analysis and its real time applications
Merge sort analysis and its real time applications
yazad dumasia
 
SORT AND SEARCH ARRAY WITH WITH C++.pptx
SORT AND SEARCH ARRAY WITH WITH C++.pptx
narifmsit18seecs
 
Chapter 8 advanced sorting and hashing for print
Chapter 8 advanced sorting and hashing for print
Abdii Rashid
 
Advanced s and s algorithm.ppt
Advanced s and s algorithm.ppt
LegesseSamuel
 
03_sorting and it's types with example .ppt
03_sorting and it's types with example .ppt
vanshii9976
 
03_sorting123456789454545454545444543.ppt
03_sorting123456789454545454545444543.ppt
ssuser7b9bda1
 
02_Gffdvxvvxzxzczcczzczcczczczxvxvxvds2.ppt
02_Gffdvxvvxzxzczcczzczcczczczxvxvxvds2.ppt
DarioVelo1
 
Algorithms and Data structures: Merge Sort
Algorithms and Data structures: Merge Sort
pharmaci
 
Quick Sort , Merge Sort , Heap Sort
Quick Sort , Merge Sort , Heap Sort
Mohammed Hussein
 
Chapter 1 - Introduction to Searching and Sorting Algorithms - Student.pdf
Chapter 1 - Introduction to Searching and Sorting Algorithms - Student.pdf
mylinhbangus
 
Data analysis and algorithm analysis presentation
Data analysis and algorithm analysis presentation
ShafiEsa1
 
module2_dIVIDEncONQUER_2022.pdf
module2_dIVIDEncONQUER_2022.pdf
Shiwani Gupta
 
Data structure using c module 3
Data structure using c module 3
smruti sarangi
 
Chapter 8 Sorting in the context of DSA.pptx
Chapter 8 Sorting in the context of DSA.pptx
Dibyesh1
 

Recently uploaded (20)

Reimagining Software Development and DevOps with Agentic AI
Reimagining Software Development and DevOps with Agentic AI
Maxim Salnikov
 
OpenChain Webinar - AboutCode - Practical Compliance in One Stack – Licensing...
OpenChain Webinar - AboutCode - Practical Compliance in One Stack – Licensing...
Shane Coughlan
 
Open Source Software Development Methods
Open Source Software Development Methods
VICTOR MAESTRE RAMIREZ
 
Milwaukee Marketo User Group June 2025 - Optimize and Enhance Efficiency - Sm...
Milwaukee Marketo User Group June 2025 - Optimize and Enhance Efficiency - Sm...
BradBedford3
 
On-Device AI: Is It Time to Go All-In, or Do We Still Need the Cloud?
On-Device AI: Is It Time to Go All-In, or Do We Still Need the Cloud?
Hassan Abid
 
Microsoft-365-Administrator-s-Guide1.pdf
Microsoft-365-Administrator-s-Guide1.pdf
mazharatknl
 
Application Modernization with Choreo - The AI-Native Internal Developer Plat...
Application Modernization with Choreo - The AI-Native Internal Developer Plat...
WSO2
 
ElectraSuite_Prsentation(online voting system).pptx
ElectraSuite_Prsentation(online voting system).pptx
mrsinankhan01
 
Building Geospatial Data Warehouse for GIS by GIS with FME
Building Geospatial Data Warehouse for GIS by GIS with FME
Safe Software
 
Zoho Creator Solution for EI by Elsner Technologies.docx
Zoho Creator Solution for EI by Elsner Technologies.docx
Elsner Technologies Pvt. Ltd.
 
arctitecture application system design os dsa
arctitecture application system design os dsa
za241967
 
Decipher SEO Solutions for your startup needs.
Decipher SEO Solutions for your startup needs.
mathai2
 
Best Practice for LLM Serving in the Cloud
Best Practice for LLM Serving in the Cloud
Alluxio, Inc.
 
IDM Crack with Internet Download Manager 6.42 Build 41 [Latest 2025]
IDM Crack with Internet Download Manager 6.42 Build 41 [Latest 2025]
pcprocore
 
Test Case Design Techniques – Practical Examples & Best Practices in Software...
Test Case Design Techniques – Practical Examples & Best Practices in Software...
Muhammad Fahad Bashir
 
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
MOVIE RECOMMENDATION SYSTEM, UDUMULA GOPI REDDY, Y24MC13085.pptx
MOVIE RECOMMENDATION SYSTEM, UDUMULA GOPI REDDY, Y24MC13085.pptx
Maharshi Mallela
 
SAP PM Module Level-IV Training Complete.ppt
SAP PM Module Level-IV Training Complete.ppt
MuhammadShaheryar36
 
Best MLM Compensation Plans for Network Marketing Success in 2025
Best MLM Compensation Plans for Network Marketing Success in 2025
LETSCMS Pvt. Ltd.
 
Foundations of Marketo Engage - Programs, Campaigns & Beyond - June 2025
Foundations of Marketo Engage - Programs, Campaigns & Beyond - June 2025
BradBedford3
 
Reimagining Software Development and DevOps with Agentic AI
Reimagining Software Development and DevOps with Agentic AI
Maxim Salnikov
 
OpenChain Webinar - AboutCode - Practical Compliance in One Stack – Licensing...
OpenChain Webinar - AboutCode - Practical Compliance in One Stack – Licensing...
Shane Coughlan
 
Open Source Software Development Methods
Open Source Software Development Methods
VICTOR MAESTRE RAMIREZ
 
Milwaukee Marketo User Group June 2025 - Optimize and Enhance Efficiency - Sm...
Milwaukee Marketo User Group June 2025 - Optimize and Enhance Efficiency - Sm...
BradBedford3
 
On-Device AI: Is It Time to Go All-In, or Do We Still Need the Cloud?
On-Device AI: Is It Time to Go All-In, or Do We Still Need the Cloud?
Hassan Abid
 
Microsoft-365-Administrator-s-Guide1.pdf
Microsoft-365-Administrator-s-Guide1.pdf
mazharatknl
 
Application Modernization with Choreo - The AI-Native Internal Developer Plat...
Application Modernization with Choreo - The AI-Native Internal Developer Plat...
WSO2
 
ElectraSuite_Prsentation(online voting system).pptx
ElectraSuite_Prsentation(online voting system).pptx
mrsinankhan01
 
Building Geospatial Data Warehouse for GIS by GIS with FME
Building Geospatial Data Warehouse for GIS by GIS with FME
Safe Software
 
Zoho Creator Solution for EI by Elsner Technologies.docx
Zoho Creator Solution for EI by Elsner Technologies.docx
Elsner Technologies Pvt. Ltd.
 
arctitecture application system design os dsa
arctitecture application system design os dsa
za241967
 
Decipher SEO Solutions for your startup needs.
Decipher SEO Solutions for your startup needs.
mathai2
 
Best Practice for LLM Serving in the Cloud
Best Practice for LLM Serving in the Cloud
Alluxio, Inc.
 
IDM Crack with Internet Download Manager 6.42 Build 41 [Latest 2025]
IDM Crack with Internet Download Manager 6.42 Build 41 [Latest 2025]
pcprocore
 
Test Case Design Techniques – Practical Examples & Best Practices in Software...
Test Case Design Techniques – Practical Examples & Best Practices in Software...
Muhammad Fahad Bashir
 
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
MOVIE RECOMMENDATION SYSTEM, UDUMULA GOPI REDDY, Y24MC13085.pptx
MOVIE RECOMMENDATION SYSTEM, UDUMULA GOPI REDDY, Y24MC13085.pptx
Maharshi Mallela
 
SAP PM Module Level-IV Training Complete.ppt
SAP PM Module Level-IV Training Complete.ppt
MuhammadShaheryar36
 
Best MLM Compensation Plans for Network Marketing Success in 2025
Best MLM Compensation Plans for Network Marketing Success in 2025
LETSCMS Pvt. Ltd.
 
Foundations of Marketo Engage - Programs, Campaigns & Beyond - June 2025
Foundations of Marketo Engage - Programs, Campaigns & Beyond - June 2025
BradBedford3
 
Ad

Parallel Sorting Algorithms. Quicksort. Merge sort. List Ranking

  • 2. 2 Sorting Algorithms • Given an array of elements with size , and a total order defined, return an array of the same elements in , with: • Sometimes in-place algorithms are preferred
  • 3. 3 Sorting Algorithms • Given an array of elements with size , and a total order defined, return an array of the same elements in , with: • Sometimes in-place algorithms are preferred • How can we sort them sequentially? • Lower bound in the comparison model: comparisons needed • Using radix-based sorting (e.g., assume all of them are integers): time needed for certain cases
  • 4. 4 Sequential sorting algorithms: some simple ones • Selection sort • Find the smallest element and put it in the first slot, then for the rest, find the smallest and put it in the second, … • Bubble sort: • Compare all adjacent elements and , and if is greater, swap them
  • 5. 5 Sequenti al sorting algorith ms: mergesor t void mergesort(int *B, int *A, int n) { if (n==1) B[0] = A[0]; else { int C[n]; mergesort(C, A, n/2); mergesort(C+n/2, A+n/2, n-n/2); B = merge(C, n/2, C+n/2, n-n/2); }} 5 7 9 12 2 10 16 26 2 5 7 9 10 12 16 26 Merge Merge Merge 5 12 7 9 26 10 2 16 5 12 7 9 10 26 2 16 5 12 7 9 26 10 2 16 5 12 7 9 26 10 2 16 5 12 7 9 10 26 2 16 Divide-and-conquer Divide-and-conquer Divide-and-conquer Base cases How to merge?
  • 6. 6 Sequential sorting algorithms: mergesort • Split the array evenly in two. • Sort each of them recursively • Merge them back – how? 0 4 7 8 1 2 3 5 6 9 0 1 2 3 4 5 6 7 8 9 • Costs time to merge two arrays of total size merge(A, na, B, nb) { p1 = 0; p2 = 0; p3 = 0; while ((p1 < na) && (p2< nb)) { if (A[p1]<B[p2]) { C[p3] = A[p1]; p1++ } else { C[p3] = B[p2]; p2++; } } //copy the rest of the unfinished array return C; }
  • 7. 7 Sequential sorting algorithms: mergesort • rounds • Each round costs time • Total Cost: Work: From Master Theorem
  • 8. 8 Sequential sorting algorithms: quicksort • Find a random pivot in the array (e.g., the middle one) • Put all elements in that are smaller than on the left, and all elements in that are greater than on the right 6 2 9 4 3 1 5 8 7 0 2 4 1 3 5 0 6 8 9 7 6 2 1 0 2 4 3 5 6 7 8 9 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 3 • How to move elements around? 8 4 7 8 4
  • 9. 9 Sequential sorting algorithms: quicksort • How to move elements around? (using 6 as a pivot) 6 2 9 4 1 3 5 8 7 0 0 2 9 4 1 3 5 8 7 6 Partition(A, n, x) { i = 0; j = n-1; while (i < j) { while (A[i] < x) i++; while (A[j] > x) j++; if (i < j) swap A[i] and A[j]; i++; j--; } } 0 2 9 4 1 3 5 8 7 6 0 2 5 4 1 3 9 8 7 6 0 2 5 4 1 3 9 8 7 6 • time for one round
  • 10. Quicksort cost analysis • Pivot is chosen uniformly at random • 1/2 chance that pivot falls in middle range, in which case sub-problem size is at most 3n/4 • Expected #rounds: (also w.h.p., with high probability) ∙ w.h.p. means that ∙ E.g., the probability that a quicksort doesn’t finish in rounds is no more than • Each round need time (partition) • In total time n/2 keys n/4 keys n/4 keys Keys in order
  • 11. 11 Sequential sorting algorithms • Quicksort is usually “quicker” than mergesort • Merge sort need additional space, and quicksort is in-place • Each recursive call in quicksort is dealing with a consecutive chunk in the input – better cache locality • What about in parallel?
  • 13. 13 Sequential quicksort • Use a pivot and partition the array into two parts • Sort each of them recursively • Use a pivot and partition the array into two parts • Sort each of them recursively, in parallel Parallel quicksort
  • 14. 14 Parallel quick sort • The partitioning algorithm costs time. So even if the problem is always perfectly partitioned • ? • Have to partition in parallel! • How to move things around? • Need to pack all elements smaller than the pivot and all elements larger than the pivot • The filter algorithm!
  • 15. 15 Parallel filtering / packing • Given an array of elements and a predicate function , output an array with elements in that satisfy 4 2 9 3 6 5 7 11 10 8 9 3 5 7 11 𝑓 (𝑥)={ 𝑡𝑟𝑢𝑒𝑖𝑓 𝑥𝑖𝑠𝑜𝑑𝑑 𝑓𝑎𝑙𝑠𝑒𝑖𝑓 𝑥𝑖𝑠 𝑒𝑣𝑒𝑛 𝐴=¿ 𝐵=¿
  • 16. 16 Using filter for partition • How to move elements around? The filter algorithm! 6 2 9 4 1 3 5 8 7 0 0 1 0 1 1 1 1 0 0 1 filter(A, flag, n) { ps = scan(flag); parallel_for(i=1 to n) { if (ps[i]!=ps[i-1]) B[ps[i]] = A[i]; } } • time for one round A flag X 2 X 4 1 3 5 X X 0 0 1 0 1 1 1 1 0 0 1 A flag 0 1 1 2 3 4 5 5 5 6 Prefix sum of flag using 6 as a pivot 2 4 1 3 5 0 pack
  • 17. 17 Using filter to partition • To get all elements smaller than the pivot and all elements larger than the pivot • We can run two separate filters • Two rounds of I/O and global data movement • Parallel partition • After doing the first scan, we know the result of the second scan!
  • 18. 18 Using filter for partition 6 2 9 4 1 3 5 8 7 0 0 1 0 1 1 1 1 0 0 1 A flag 0 1 1 2 3 4 5 5 5 6 scan1 using 6 as a pivot 2 4 1 3 5 0 pack 1 1 2 2 2 2 2 3 4 4 scan2 scan1[]: the prefix sum of 1s scan2[]: the prefix sum of 0s => scan1[i] + scan2[i] = i => scan2[i] = i - scan1[i] 6 9 8 7
  • 19. 19 Parallel quicksort • Using the filter algorithm to do partition • Finishes in rounds in expectation (also w.h.p.) • Each round need work and depth • work and depth in total
  • 21. 21 Sequential merge sort • Split the array evenly in two • Sort each of them recursively • Merge them back • Split the problem size evenly in two • Sort each of them recursively, in parallel • Merge them back Parallel merge sort
  • 22. 22 Parallel merge sort • The merging algorithm costs time. So • ? • Have to merge in parallel!
  • 23. 23 A parallel merge algorithm • Find the median of one array • Binary search it in the other array • Put in the correct slot • Recursively, in parallel do: • Merge the left two sub-arrays into the left half of the output • Merge the right ones into the right half of the output 9 3 4 6 2 0 1 5 7 8 4 1 2 3 0 9 6 7 8 5 Binary search 3 2 0 1 9 6 5 7 8 Subproblem 1: Merge 2,3 with 0,1 Subproblem 2: Merge 6,9 with 5,7,8
  • 24. 24 A parallel merge algorithm 9 3 4 6 2 0 1 5 7 8 4 1 2 3 0 9 6 7 8 5 Binary search 3 2 0 1 9 6 5 7 8 Subproblem 1: Merge 2,3 with 0,1 Subproblem 2: Merge 6,9 with 5,7,8 //merge array A of length n1 and array B of length n2 into array C. Merge(A’, n1, B’, n2, C) { if (A’ is empty or B’ is empty) base_case; m = n1/2; m2 = binary_search(B’, A’[m]); C[m+m2+1] = A’[m]; in parallel: merge(A’, m, B’, m2, C); merge(A’+m+1, n1-m-1, B’+m2+1, n2-m2-1, C+m+m2); return C; }
  • 25. 25 A parallel merge algorithm • In each recursive call the only work is the binary search • Assume the original input arrays are and . They are both of the same size . • Assume in each recursive call, we are dealing with and , they can have different sizes. • Array from is always perfectly partitioned, but it’s not the case for array . But, as long as is empty, we reach the base case. • So in log n rounds we reach the base case. In each round the cost is also O(log n) 9 3 4 6 2 0 1 5 7 8 4 1 2 3 0 9 6 7 8 5 Binary search 3 2 0 1 9 6 5 7 8 Subproblem 1: Merge 2,3 with 0,1 Subproblem 2: Merge 6,9 with 5,7,8 𝐷 ( 𝑁 )=𝑂(log2 𝑁)
  • 26. 26 Parallel Merge: work • Round 1: 1 element in searches in , takes time • Round 2: 2 elements in search in , takes time . . • Round 3: 4 elements in search in , takes time . . • Round 4: 8 elements in search in , takes time . . Concavity of log: 𝑓 (𝑥+ 𝑦 2 ) 𝑓 ( 𝑥)+ 𝑓 ( 𝑦) 2 𝑓 ( 𝑥)+ 𝑓 ( 𝑦) 2 ≤ 𝑓 (𝑥+ 𝑦 2 ) More generally: Sum of logs is no more than times the log of their average value The average of logs is no more than log of the average of the input variables 𝑾 (𝒏)=𝑾 (𝒏𝟏 )+𝑾 (𝒏 −𝒏𝟏)+𝐥𝐨𝐠 𝒏
  • 27. 27 Parallel Merge: work • Round 1: 1 element in searches in , takes time • Round 2: 2 elements in search in • Round 3: 4 elements in search in . • Round 4: 8 elements in search in , takes time . . This is leaf-dominated 𝑾 (𝒏)=𝑾 (𝒏𝟏 )+𝑾 (𝒏 −𝒏𝟏)+𝐥𝐨𝐠 𝒏 Use Master Theorem
  • 28. 28 Parallel merge sort • Parallel merge: work and depth • Can be easily reduced to depth – your homework • Finishes in rounds • Total work: , depth: • Can be easily reduced to depth – your homework
  • 29. 29 Parallel sorting algorithms • Quicksort • work, depth • Mergesort • //can be reduced to depth with a simple variant, you’ll see it in your homework • Quicksort is not “quick” any more • Need additional space for filtering/packing • Better depth bound?
  • 30. 30 Parallel sorting - work and depth • For an array of size , there are pairs of elements • Compare all of them gives us all information needed (there are redundant information, but let’s just store all of them) • The comparisons tell us • For a certain element , the relevant comparisons tell us how many elements are smaller than it • That can be computed by a parallel reduce • That is the rank of ! • Directly write to the -th location in the output • The work is since we need to compare all the pairs • The depth is because of the reduce algorithm • This algorithm actually parallelizes the selection sort
  • 32. 32 Linked Lists • Linked lists are simple and important data structures • Sometimes we have a tree of nodes with pointers indicating their parents • We want to know the rank of each node (e.g., the distance to the head/tail)
  • 33. List Ranking Source: “Parallel Algorithms” by Guy E. Blelloch and Bruce M. • Input array P, P[i]=j means that the i-th element’s parent is the j- th element • In practice the input can be a linked list with next/parent pointers • Follow the pointers until reaching the root
  • 34. Work-Efficient List Ranking ListRanking(list P) 1. If list has two or fewer nodes, then return //base case 2. Every node flips a fair coin 3. For each vertex u (except the last vertex), if u flipped Tails and P[u] flipped Heads then u will be paired with P[u] A. rank(u) = rank(u)+rank(P[u]) B. P[u] = P[P[u]] 4. Recursively call ListRanking on smaller list 5. Insert contracted nodes v back into list with rank(v) = rank(v) + rank(P[v]) 1 1 1 1 0 1 T H H T H T 2 1 2 0 Source: MIT 6.886 by Julian Shun Remove an element if: It is head Its previous element is a tail Idea: reduce the problem size by a constant factor per round, and apply the algorithm recursively
  • 35. Work-Efficient List Ranking 1 1 1 1 0 1 T H H T H T 2 1 2 0 Apply recursively 5 3 2 0 Contract + packing Expand 5 3 2 1 0 4 Source: MIT 6.886 by Julian Shun
  • 36. 36 Work-Depth Analysis • Number of pairs per round is reduced by (n-1)/4 in expectation • For all nodes u except for the last node, probability of u flipping Head and its previous element flipping Tails is 1/4 • => A node gets removed with probability 1/4 • Each round takes linear work and O(log n) depth • Expected work: W(n) W(3n/4) + O(n) ≤ • Expected depth: D(n) D(3n/4) + O(log n) ≤ W = O(n) D = O(log2 n) in arbitrary-forking Source: MIT 6.886 by Julian Shun Randomization is our good friend!!! Remove an element if: It is head Its previous element is a tail
  • 37. 37 Parallel sorting algorithms • Quicksort • work, depth • Mergesort • //can be reduced to depth with a simple variant, you’ll see it in your homework • Selection sort • In parallel algorithm design, it is likely that to get better depth, you need to pay more work – there is a tradeoff • Choose the best algorithm depending on your application
  • 38. 38 Parallel sorting algorithms • Quicksort • work, depth • Mergesort • //can be reduced to depth with a simple variant, you’ll see it in your homework • Selection sort • Usually, parallel sample sort has the best performance in practice – we’ll cover that in the lectures about I/O efficiency • The techniques in quicksort and mergesort are useful for samplesort also
  • 39. 39 Course Project • If you plan to work on sorting algorithms – you can start thinking • Generally, what you can consider • Evaluate and compare several existing algorithms • Test the influence of some factors to the performance of one/several algorithms • E.g., different environments/compilers/compilation settings/schedulers/input instances (input distribution, graph structure, etc.)/machines/#of cores/cache sizes/… • Implement a fairly complicated algorithm we learnt in class • Write down how you make it correct/improve its performance step by step • Propose optimizations for existing algorithms • Propose a new algorithm
  • 40. 40 Course Project – what to do • Motivation/intro/background • What is the definition of the problem? What are the applications? How do existing solutions solve it? What is the high-level idea/intuition of your project? • Methodology • Pseudocode of the algorithm you implemented? What optimizations do you use and why you think that would help? Is there any theoretical guarantee of your methodology? • Experiments • Evaluate different settings • Compare with others’ implementations • Did your optimization work? If so, how much does it help? If not, do you have a theory why? • Conclusion • What does the result tell you? What did you learn from the project? Are there things that you tried but didn’t work? Are there any potential future works?