SlideShare a Scribd company logo
Lecture 8
Sorting
Sorting (Chapter 7)
We have a list of real numbers.
Need to sort the real numbers in increasing order
(smallest first).
Important points to remember:
We have O(N2
) algorithms
O(NlogN) algorithms (heap sort)
In general, any algorithm for sorting must be (NlogN)
(will prove it)
In special cases, we can have O(N) sorting algorithms.
Insertion Sort
Section 7.2 (Weiss)
Initially P = 1
Let the first P elements be sorted.
(3)Then the P+1 th element is inserted properly in
the list so that now P+1 elements are sorted.
Now increment P and go to step (3)
P+1th element is inserted as follows:
Store the P+1 th element first as some temporary variable,
temp;
If Pth element greater than temp, then P+1th element is set
equal to the Pth one,
If P-1 th element greater than temp, then Pth element is set
equal to P-1th one…..
Continue like this till some kth element is less than or equal
to temp, or you reach the first position.
Let this stop at kth position (k can be 1). Set the k+1th
element equal to temp.
Extended Example
Need to sort:
34 8 64 51 32 21
P = 1;
Looking at first element only, and we do not
change.
P = 2;
Temp = 8;
34 > Temp, so second element is set to 34.
We have reached the end of the list. We stop there. Thus,
first position is set equal to Temp;
After second pass;
8 34 64 51 32 21
Now, the first two elements are sorted.
Set P = 3.
Temp = 64, 34 < 64, so stop at 3rd
position and set 3rd
position = 64
After third pass: 8 34 64 51 32 21
P = 4, Temp = 51, 51 < 64, so we have 8 34 64 64 32 21,
34 < 51, so stop at 2nd position, set 3rd
position = Temp,
List is 8 34 51 64 32 21
Now P = 5, Temp = 32, 32 < 64, so 8 34 51 64 64 21,
32 < 51, so 8 34 51 51 64 21, next 32 < 34 so, 8 34
34, 51 64 21, next 32 > 8, so we stop at first position
and set second position = 32, we have
8 32 34 51 64 21,
Now P = 6,
We have 8 21 32 34 51 64
Pseudo Code
Assume that the list is stored in an array, A (can do with a
linked list as well)
Insertion Sort(A[],int N)
{
for (P = 1; P < N; P++)
{
Temp = A[P];
for (j = P; j > 0 and A[j-1] > Temp; j--) A[j] = A[j-1];
A[j] = Temp;
} }
Complexity Analysis
Inner loop is executed P times, for each P
Because of outer loop, P changes from 1 to N.
Thus overall we have 1 + 2 +…..N.
This is O(N2
)
Space requirement: O(N)
Heap Sort
Section 7.5
Complexity: O(NlogN)
Suppose we need to store the sorted elements, in this case
we need an additional array. Total space is 2N. This is O(N),
but can we do with a space of N positions only?
Whenever we do a deletemin , the heapsize shrinks by one.
We can store the elements in the additional spaces.
We have a heap of size N.
After the first deletemin, the last element is at position N-1.
Store the retrieved element at position N
After the second deletemin, the last element is at position N-
2, so the retrieved element is stored at N-1, and so on.
Array will finally have elements in increasing order
Merge Sort
Divide and Conquer
Divide the list into two smaller lists:
Sort the smaller lists.
Merge the sorted lists to get an overall sorted list.
How do we merge 2 sorted lists?
Consider 2 sorted arrays A and B
We will merge them into a sorted array C.
Have three variables, posA, posB, posC
Initially, both of these are at the respective first positions
(3)Compare the elements at posA and posB
Whichever is smaller, goes into the posC position of C,
and the position is advanced in the corresponding array.
posC is also advanced (e.g., if element of A goes into C,
then posA is advanced).
Go to step (3)
After all the elements are exhausted in one array, the
Suppose we want to merge
1 13 24 26
2 15 27 38
Pos A = 1, pos B = 2, pos C = 1 Merged List: 1
Complexity of the merging?
pos A = 13, pos B = 2, pos C = 2 Merged List: 1, 2
pos A = 13, pos B = 15, pos C = 13 Merged List: 1,2, 13
pos A = 24, pos B = 15, pos C = 15 Merged List: 1,2,13,15
pos A = 24, pos B = 27, pos C = 24 Merged List: 1,2,13,15,24
pos A = 26, pos B = 27, pos C = 26 Merged List:
1,2,13,15,24,26
A ends, so remaining of B is added: 1,2,13,15,24,26,27,38
O(m + n)
Pseudocode for merging
Merge (A, B, C){
posA = 0;
posB=0;
posC=0;
While (posA < sizeA)&& (posB < sizeB)
{if (A[posA] <B[posB])
{ C[posC]=A[posA];
posA = posA + 1;}
Else
{C[posC] = B[posB];
posB = posB + 1;}
posC = posC + 1;}
While (posB < sizeB)
{C[posC] =B[posB];
Increment posC and posB;}
While (posA < sizeA)
{C[posC] =A[posA];
Increment posC and posA;}
}
Overall approach
Divide an array into 2 parts
Recursively sort the 2 parts (redivide, recursively sort,
sorting one element is easy)
Merge the 2 parts
Mergesort(A,left,right)
{ if (left right) break; center = (left + right)/2;
Mergesort(A,left,center);
Mergesort(A,center+1,right);
B = left half of A; C = right half of A;
Merge(B,C,D); Set left to right of A = D;}
Need to sort:
34 8 64 51 32 21
Sort 34 8 64 : 8, 34, 64
Sort 51 32 21: 21, 32, 51
Merge the two:
We have 8, 21, 32, 34,51,64
T(n) = 2T(n/2) + cn
T(1) = 1;
Using master theorem, we get T(n) as O(nlogn)
Any problem with merge sort?
Additional space
Quick Sort
Storage: O(n)
We will choose a pivot element in the list.
Partition the array into two parts.
One contains elements smaller than this pivot,
another contains elements larger than this pivot.
Recursively sort and merge the two partitions.
Section 7.7
How do we merge?
We do not need any additional space for
merging. Thus storage is O(n)
How can the approach become bad?
Want to sort 1 2 3 4 5 6 …n Choose 1 as pivot
The partitions are 1 2 3 4 5 6…
To get these partitions we spent ?
Let next time pivot be 2 partitions are 2 3 4 5 6….n
Spent how much?
O(n-1)
O(n-2)
Overall how much do we spend for partitioning? O(n2
)
Moral of the story: We would like the partitions to be as
equal as possible.
This depends on the choice of the pivot.
Equal partitions are assured if we can choose our pivot as
the element which is in the middle of the sorted list.
It is not easy to find that.
Pivots are chosen randomly.
In worst case, pivot choice can make one partition have n-1
elements, another partition 1 element. O(n2
)
On an average this will not happen.
Worst case complexity?
Pivot can be chosen as a random element in the list.
Or choose the first, the last and the center and take the
median of the three (middle position).
We will do the latter.
We discuss the partitioning now.
Interchange the pivot with the last element
Have a pointer at the first element (P1), and one at the
second last element (P2).
Move P1 to the right skipping elements which are less
than the pivot.
Move P2 to the left skipping elements which are more
than the pivot.
Stop P1 when we encounter an element greater than or
equal to the pivot.
Stop P2 when we encounter an element lesser than or
equal to the pivot.
Interchange the elements pointed to by P1 and P2.
If P1 is right of P2, stop, otherwise move P1 and P2 as
before till we stop again.
8 1 4 9 6 3 5 2 7 0
When we stop, swap P1 with the last element
which is the pivot
First = 8 Last = 0, Median = 6, Pivot = 6
8 1 4 9 0 3 5 2 7 6
P1 P2
8 1 4 9 0 3 5 2 7 6
P1 P2
2 1 4 9 0 3 5 8 7 6
P1 P2
2 1 4 9 0 3 5 8 7 6
P1 P2
2 1 4 5 0 3 9 8 7 6
P2 P1
2 1 4 5 0 3 6 8 7 9
Partition 1 Partition 2
At any time can you say anything about the
elements to the left of P1?
Elements to the right of P1 are greater than or equal
to the pivot.
When P1 and P2 cross, what can you say about the
elements in between P1 and P2?
They are all equal to the pivot.
Also, for right of P2?
Elements to the left of P1 are less than or
equal to the pivot.
Suppose P1 and P2 have crossed, and stopped and
the pivot is interchanged with P1.
How do we form the partition?
Everything including P1 and its right are in one
partition (greater).
Remaining are in the left partition.
We can also do some local optimizations in length.
Complexity?
Space?
O(n)
O(n)
Procedure Summary
Partition the array
Sort the partition recursively.
8 1 4 9 6 3 5 2 7 0
2 1 4 5 0 3 6 8 7 9
Partition 1 Partition 2
0 1 2 3 4 5 6 7 8 9
Partition 1 Sorted Partition 2 Sorted
Need to do any thing more? Merger is automatic
Pseudocode
Quicksort(A, left, right)
{ Find pivot;
Interchange pivot and A[right];
P1 = left; P2 = right – 1;
Partition(A, P1, P2, pivot); /*returns newP1*/
Interchange A[newP1] and A[right];
Quicksort(A, left, newP1-1);
Quicksort(A, newP1, right); }
Partition(A, P1, P2,pivot)
{
While (P1  P2)
{
While (A[P1] < pivot) increment P1;
While (A[P2] > pivot) decrement P2;
Swap A[P1] and A[P2];
increment P1; decrement P2;
}
newP1 = P1;
return(newP1);
}
Worst Case Analysis
T(n) = T(n1) + T(n2) + cn
T(1) = 1;
n1 + n2 = n
In good case, n1 = n2 = n/2 always
Thus T(n) = O(nlogn)
In bad case, n1 = 1 n2 = n-1 always
T(n) = pn + T(n-1)
= pn + p(n-1) + T(n-2)
……..
= p(n + n-1 +……+1)
= pn2
Thus T(n) is O(n2
) in worst case.
Average case complexity is O(nlog n)
Quicksort performs well for large inputs, but not so good
for small inputs.
When the divisions become small, we can use insertion
sort to sort the small divisions instead.
General Lower Bound For
Sorting
Suppose a list of size k must be sorted.
How many orderings can we have for k members?
Depending on the values of the n numbers, we can
have n! possible orders, e.g., the sorted output for
a,b,c can be a,b,c, b,a,c, a,c,b, c,a,b, c,b,a, b,c,a
Section 7.9
Any comparison based sorting process can be
represented as a binary decision tree.
A node represents a comparison, and a branch represents
the outcome of a comparison.
The possible orders must be the leaves of the decision
tree, i.e., depending on certain orders we will have
certain comparison sequences, and we will reach a
certain leaf.
a b c
b a c
a c b b c a c b a
c a b
C1
C2
C3
C4 C5
Compare a and b
C6
Different sorting algorithms will have different
comparison orders, but the leaves are same for all
sorting trees.
Thus any sorting algorithm for sorting n inputs can be
represented as a binary decision tree of n! leaves.
Such a tree has depth at least log n!
This means that any comparison based sorting algorithm need
to perform at least log n! Comparisons in the worst case.
log n! = log n + log (n-1) +…..+log(2) + log(1)
log n + log (n-1) +……+ log (n/2)
(n/2)log (n/2)
= (n/2)log (n) – (n/2) = (nlog n)
Special Case Sorting
Now we will present some linear time sorting algorithms.
These apply only when the input has a special structure, e.g.,
inputs are integers.
Counting sort
Radix sort
Bucket sort
Please follow class notes
Counting Sort
Suppose we know that the list to be sorted consists
of n integers in the range 1 to M
We will declare an array B with M positions.
Initially all positions of B contain 0.
Scan through list A
A[j] is inserted in B[A[j]]
B is scanned once again, and the nonzero elements are read out.
What do we do with equal elements?
B is an array of pointers.
Each position in the array has 2 pointers, head and tail. Tail
points to the end of a linked list, and head points to the
beginning.
A[j] is inserted at the end of the list B[A[j]]
Again, Array B is sequentially traversed and each nonempty list
is printed out.
M = 10,
Wish to sort 8 1 9 6 5 2 7
1 2 5 6 7 8 9
Output: 1 2 5 6 7 8 9
M = 10,
Wish to sort 8 5 1 5 9 5 6 2 7
1 2 5 6 7 8 9
Output: 1 2 5 5 5 6 7 8 9
5
5
Complexity?
Storage?
Storage could be large for large ranges.
Supposing we have a list of elements, such that every
element has 2 fields, one an integer, and another field
something else.
During sorting we just look at the integer field.
Supposing element a occurs before element b in the
input list, and a and b have the same integer
components, Can their positions be reversed in the
output if counting sort is used?
O(n + M)
If M is O(n), then complexity is O(n)
O(n+M)
Stability Property
Relative position of 2 equal elements remain the same in the
output as in the input.
Is merge sort stable?
Quick sort?
Insertion sort?
Stability property of counting sort will be used in designing
a new sorting algorithm, radix sort.
Yes
No
Yes
Radix Sort
Every integer is represented by at most d digits
(e.g., decimal representation)
Every digit has k values (k = 10 for decimal, k = 2 for
binary)
There are n integers
We would sort the integers by their least significant digits
first using counting sort.
Next sort it by its second least significant digit and so on.
13 22 331 27
Should it work?
013 022 331 027
Sort by least significant digit: 331 022 013 027
Sort by second least significant digit: 013 022 027 331
Sort by most significant digit: 013 022 027 331
Note that if the most significant digit of one number a is
less than that of another number b, then a comes before b.
However, if the most significant digits are the same
for a and b, and the difference is in the second most
significant digits ( second most significant digit of b
is less than that of a), then b comes before a.
Why?
Using stability property of counting sort
Complexity Analysis
There are d counting sorts.
Each counting sort works on n numbers with the range
being 0 to k.
Complexity of each counting sort is O(n + k)
Overall O(d(n+k))
Storage? O(n + k)
If just counting sort were used for these numbers, what
is the worst case complexity?
O(kd
+ n) for storage and running time
We have sorted integers in worst case linear time.
Can we do anything similar for real numbers?
Real numbers can be sorted in average case linear complexity
under a certain assumption.
assuming that the real numbers are uniformly distributed
if we know that the real numbers belong to a certain range,
then uniform distribution means any two equal size subranges
contain roughly the same number of real numbers in the list.
say numbers are between 0 to 10, then the total number of
numbers between 1 to 3 is roughly equal to that between 7 and 9
Bucket Sort
Bucket sort can be used to sort such real numbers in
average case linear complexity
The idea is similar to direct sequence hashing and
counting sort.
We have an array of n pointers. Each position has a
pointer pointing to a linked list.
We have a function which generates an integer from
a real number, and adds the real number in the linked
list at the position.
We have a list of n real numbers.
The function is chosen such that the elements at the position j of
the array are less than those at position k, if j < k.
The function must be such that the number of
elements in each linked list is roughly the same.
Thus we just sort the linked lists at every position.
Next we start from position 0, print out the sorted linked
list, go to position 1, print out the sorted linked (assuming
it is not empty) and so on.
The output is a sorted list.
So if there are n real numbers and n positions in the
array, then each linked has roughly a constant
number c.
Thus sorting complexity for a linked list is a
constant.
What is the printing complexity?
Thus overall complexity is O(n)
How do we find such a function?
We can if the numbers are uniformly distributed.
Suppose we know the numbers are in an interval of size M.
We divide the interval M into n equal size subintervals, and
name them 0, 1,….n-1.
Any real number in subinterval j is added to the list at
position j in the array.
Given any real number, the function actually finds the
number of the subinterval where it belongs.
Firstly, why should such a function put roughly
equal number of elements in each list?
Second, evaluation time for the function must be a constant.
That is, the function must find the interval number in
constant time.
Will give an example function which applies for a
specific case, but in general such functions can be
designed for more general cases as well.
Because subranges of equal size contain roughly
equal number of elements because of the uniform
distribution assumption
Suppose the real numbers are in the interval [0, 1].
The function nx gets the correct interval number for
any real number x in the interval [0, 1].
n is the number of ranges we consider
What is nx for any number x is the interval [0, 1/n) ?
any number x is the interval [1/n, 2/n) ?
any number x is the interval [2/n, 3/n) ?
……………………………..
Evaluation complexity for the function? O(1)
Thus we have divided the interval [0, 1] in n equal
sized subintervals, and
We have a function which returns the subinterval
number of a real number in constant time.
n = 10
0.9 0.01 0.39 0.47 0.82 0.31
0.9
0.01 0.39 0.47 0.82
0.31
0.9
0.01 0.31 0.47 0.82
0.39
Procedure Summary
We have a list of n real numbers.
Given any number x, we add it at the list at A[nx] .
We sort all of the individual linked lists.
We output the sorted linked list at A[0], then at A[1], and so
on.
The output is sorted.
On an average every linked list contains a constant
number of elements, (as a linked list contains elements
from a subinterval, and all subintervals are equal sized,
and hence they have equal number of elements because of
uniform distribution)
Happens if we choose n roughly equal to the number
of inputs.
Thus sorting complexity is constant for each linked list.
Thus overall sorting and printing complexity is O(n).

More Related Content

PPT
PDF
Skiena algorithm 2007 lecture08 quicksort
PPTX
CSE680-07QuickSort.pptx
PDF
searching.pdf
PPT
Unit 7 sorting
PPTX
Sorting2
PDF
Skiena algorithm 2007 lecture09 linear sorting
PPT
Unit6 C
Skiena algorithm 2007 lecture08 quicksort
CSE680-07QuickSort.pptx
searching.pdf
Unit 7 sorting
Sorting2
Skiena algorithm 2007 lecture09 linear sorting
Unit6 C

Similar to free power point ready to download right now (20)

PPTX
Divide and Conquer_Binary_Search_Merge_Sort.pptx
PPTX
Weak 11-12 Sorting update.pptxbhjiiuuuuu
PPT
3.8 quick sort
PPT
DAA-Divide and Conquer methodology, DAA 2024
PPT
lecture 10
PPT
lecture4.ppt
DOCX
Selection sort lab mannual
PDF
BCS401 ADA Second IA Test Question Bank.pdf
PPT
Introduction to basic algorithm knowledge.ppt
PPT
Algorithms and computational complexity.ppt
PDF
Daa chapter5
PPTX
Sorting pnk
PPT
Sorting algorithums > Data Structures & Algorithums
PPTX
Lecture 5_ Sorting and order statistics.pptx
PPTX
Data structure 8.pptx
PDF
Sienna 4 divideandconquer
PPT
Unit-2-Sorting (Merge+Quick+Heap+Binary Searach).ppt
DOCX
Best,worst,average case .17581556 045
PPT
Divide and conquer
PPTX
Insertion and Redix Sort
Divide and Conquer_Binary_Search_Merge_Sort.pptx
Weak 11-12 Sorting update.pptxbhjiiuuuuu
3.8 quick sort
DAA-Divide and Conquer methodology, DAA 2024
lecture 10
lecture4.ppt
Selection sort lab mannual
BCS401 ADA Second IA Test Question Bank.pdf
Introduction to basic algorithm knowledge.ppt
Algorithms and computational complexity.ppt
Daa chapter5
Sorting pnk
Sorting algorithums > Data Structures & Algorithums
Lecture 5_ Sorting and order statistics.pptx
Data structure 8.pptx
Sienna 4 divideandconquer
Unit-2-Sorting (Merge+Quick+Heap+Binary Searach).ppt
Best,worst,average case .17581556 045
Divide and conquer
Insertion and Redix Sort
Ad

Recently uploaded (20)

PDF
1_English_Language_Set_2.pdf probationary
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
Hazard Identification & Risk Assessment .pdf
PDF
Indian roads congress 037 - 2012 Flexible pavement
PDF
Complications of Minimal Access Surgery at WLH
PDF
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Computing-Curriculum for Schools in Ghana
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PPTX
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
PPTX
Lesson notes of climatology university.
PPTX
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
PPTX
Introduction to Building Materials
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
1_English_Language_Set_2.pdf probationary
LDMMIA Reiki Yoga Finals Review Spring Summer
Weekly quiz Compilation Jan -July 25.pdf
Hazard Identification & Risk Assessment .pdf
Indian roads congress 037 - 2012 Flexible pavement
Complications of Minimal Access Surgery at WLH
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Computing-Curriculum for Schools in Ghana
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
Lesson notes of climatology university.
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
Introduction to Building Materials
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
Ad

free power point ready to download right now

  • 2. Sorting (Chapter 7) We have a list of real numbers. Need to sort the real numbers in increasing order (smallest first). Important points to remember: We have O(N2 ) algorithms O(NlogN) algorithms (heap sort) In general, any algorithm for sorting must be (NlogN) (will prove it) In special cases, we can have O(N) sorting algorithms.
  • 3. Insertion Sort Section 7.2 (Weiss) Initially P = 1 Let the first P elements be sorted. (3)Then the P+1 th element is inserted properly in the list so that now P+1 elements are sorted. Now increment P and go to step (3)
  • 4. P+1th element is inserted as follows: Store the P+1 th element first as some temporary variable, temp; If Pth element greater than temp, then P+1th element is set equal to the Pth one, If P-1 th element greater than temp, then Pth element is set equal to P-1th one….. Continue like this till some kth element is less than or equal to temp, or you reach the first position. Let this stop at kth position (k can be 1). Set the k+1th element equal to temp.
  • 5. Extended Example Need to sort: 34 8 64 51 32 21 P = 1; Looking at first element only, and we do not change. P = 2; Temp = 8; 34 > Temp, so second element is set to 34. We have reached the end of the list. We stop there. Thus, first position is set equal to Temp;
  • 6. After second pass; 8 34 64 51 32 21 Now, the first two elements are sorted. Set P = 3. Temp = 64, 34 < 64, so stop at 3rd position and set 3rd position = 64 After third pass: 8 34 64 51 32 21 P = 4, Temp = 51, 51 < 64, so we have 8 34 64 64 32 21, 34 < 51, so stop at 2nd position, set 3rd position = Temp, List is 8 34 51 64 32 21
  • 7. Now P = 5, Temp = 32, 32 < 64, so 8 34 51 64 64 21, 32 < 51, so 8 34 51 51 64 21, next 32 < 34 so, 8 34 34, 51 64 21, next 32 > 8, so we stop at first position and set second position = 32, we have 8 32 34 51 64 21, Now P = 6, We have 8 21 32 34 51 64
  • 8. Pseudo Code Assume that the list is stored in an array, A (can do with a linked list as well) Insertion Sort(A[],int N) { for (P = 1; P < N; P++) { Temp = A[P]; for (j = P; j > 0 and A[j-1] > Temp; j--) A[j] = A[j-1]; A[j] = Temp; } }
  • 9. Complexity Analysis Inner loop is executed P times, for each P Because of outer loop, P changes from 1 to N. Thus overall we have 1 + 2 +…..N. This is O(N2 ) Space requirement: O(N)
  • 10. Heap Sort Section 7.5 Complexity: O(NlogN) Suppose we need to store the sorted elements, in this case we need an additional array. Total space is 2N. This is O(N), but can we do with a space of N positions only? Whenever we do a deletemin , the heapsize shrinks by one. We can store the elements in the additional spaces.
  • 11. We have a heap of size N. After the first deletemin, the last element is at position N-1. Store the retrieved element at position N After the second deletemin, the last element is at position N- 2, so the retrieved element is stored at N-1, and so on. Array will finally have elements in increasing order
  • 12. Merge Sort Divide and Conquer Divide the list into two smaller lists: Sort the smaller lists. Merge the sorted lists to get an overall sorted list. How do we merge 2 sorted lists?
  • 13. Consider 2 sorted arrays A and B We will merge them into a sorted array C. Have three variables, posA, posB, posC Initially, both of these are at the respective first positions (3)Compare the elements at posA and posB Whichever is smaller, goes into the posC position of C, and the position is advanced in the corresponding array. posC is also advanced (e.g., if element of A goes into C, then posA is advanced). Go to step (3) After all the elements are exhausted in one array, the
  • 14. Suppose we want to merge 1 13 24 26 2 15 27 38 Pos A = 1, pos B = 2, pos C = 1 Merged List: 1 Complexity of the merging? pos A = 13, pos B = 2, pos C = 2 Merged List: 1, 2 pos A = 13, pos B = 15, pos C = 13 Merged List: 1,2, 13 pos A = 24, pos B = 15, pos C = 15 Merged List: 1,2,13,15 pos A = 24, pos B = 27, pos C = 24 Merged List: 1,2,13,15,24 pos A = 26, pos B = 27, pos C = 26 Merged List: 1,2,13,15,24,26 A ends, so remaining of B is added: 1,2,13,15,24,26,27,38 O(m + n)
  • 15. Pseudocode for merging Merge (A, B, C){ posA = 0; posB=0; posC=0; While (posA < sizeA)&& (posB < sizeB) {if (A[posA] <B[posB]) { C[posC]=A[posA]; posA = posA + 1;}
  • 16. Else {C[posC] = B[posB]; posB = posB + 1;} posC = posC + 1;} While (posB < sizeB) {C[posC] =B[posB]; Increment posC and posB;} While (posA < sizeA) {C[posC] =A[posA]; Increment posC and posA;} }
  • 17. Overall approach Divide an array into 2 parts Recursively sort the 2 parts (redivide, recursively sort, sorting one element is easy) Merge the 2 parts Mergesort(A,left,right) { if (left right) break; center = (left + right)/2; Mergesort(A,left,center); Mergesort(A,center+1,right); B = left half of A; C = right half of A; Merge(B,C,D); Set left to right of A = D;}
  • 18. Need to sort: 34 8 64 51 32 21 Sort 34 8 64 : 8, 34, 64 Sort 51 32 21: 21, 32, 51 Merge the two: We have 8, 21, 32, 34,51,64
  • 19. T(n) = 2T(n/2) + cn T(1) = 1; Using master theorem, we get T(n) as O(nlogn) Any problem with merge sort? Additional space
  • 20. Quick Sort Storage: O(n) We will choose a pivot element in the list. Partition the array into two parts. One contains elements smaller than this pivot, another contains elements larger than this pivot. Recursively sort and merge the two partitions. Section 7.7 How do we merge? We do not need any additional space for merging. Thus storage is O(n)
  • 21. How can the approach become bad? Want to sort 1 2 3 4 5 6 …n Choose 1 as pivot The partitions are 1 2 3 4 5 6… To get these partitions we spent ? Let next time pivot be 2 partitions are 2 3 4 5 6….n Spent how much? O(n-1) O(n-2) Overall how much do we spend for partitioning? O(n2 )
  • 22. Moral of the story: We would like the partitions to be as equal as possible. This depends on the choice of the pivot. Equal partitions are assured if we can choose our pivot as the element which is in the middle of the sorted list. It is not easy to find that. Pivots are chosen randomly. In worst case, pivot choice can make one partition have n-1 elements, another partition 1 element. O(n2 ) On an average this will not happen. Worst case complexity?
  • 23. Pivot can be chosen as a random element in the list. Or choose the first, the last and the center and take the median of the three (middle position). We will do the latter. We discuss the partitioning now.
  • 24. Interchange the pivot with the last element Have a pointer at the first element (P1), and one at the second last element (P2). Move P1 to the right skipping elements which are less than the pivot. Move P2 to the left skipping elements which are more than the pivot. Stop P1 when we encounter an element greater than or equal to the pivot. Stop P2 when we encounter an element lesser than or equal to the pivot.
  • 25. Interchange the elements pointed to by P1 and P2. If P1 is right of P2, stop, otherwise move P1 and P2 as before till we stop again. 8 1 4 9 6 3 5 2 7 0 When we stop, swap P1 with the last element which is the pivot First = 8 Last = 0, Median = 6, Pivot = 6 8 1 4 9 0 3 5 2 7 6 P1 P2
  • 26. 8 1 4 9 0 3 5 2 7 6 P1 P2 2 1 4 9 0 3 5 8 7 6 P1 P2 2 1 4 9 0 3 5 8 7 6 P1 P2 2 1 4 5 0 3 9 8 7 6 P2 P1 2 1 4 5 0 3 6 8 7 9 Partition 1 Partition 2
  • 27. At any time can you say anything about the elements to the left of P1? Elements to the right of P1 are greater than or equal to the pivot. When P1 and P2 cross, what can you say about the elements in between P1 and P2? They are all equal to the pivot. Also, for right of P2? Elements to the left of P1 are less than or equal to the pivot.
  • 28. Suppose P1 and P2 have crossed, and stopped and the pivot is interchanged with P1. How do we form the partition? Everything including P1 and its right are in one partition (greater). Remaining are in the left partition. We can also do some local optimizations in length. Complexity? Space? O(n) O(n)
  • 29. Procedure Summary Partition the array Sort the partition recursively. 8 1 4 9 6 3 5 2 7 0 2 1 4 5 0 3 6 8 7 9 Partition 1 Partition 2 0 1 2 3 4 5 6 7 8 9 Partition 1 Sorted Partition 2 Sorted Need to do any thing more? Merger is automatic
  • 30. Pseudocode Quicksort(A, left, right) { Find pivot; Interchange pivot and A[right]; P1 = left; P2 = right – 1; Partition(A, P1, P2, pivot); /*returns newP1*/ Interchange A[newP1] and A[right]; Quicksort(A, left, newP1-1); Quicksort(A, newP1, right); }
  • 31. Partition(A, P1, P2,pivot) { While (P1  P2) { While (A[P1] < pivot) increment P1; While (A[P2] > pivot) decrement P2; Swap A[P1] and A[P2]; increment P1; decrement P2; } newP1 = P1; return(newP1); }
  • 32. Worst Case Analysis T(n) = T(n1) + T(n2) + cn T(1) = 1; n1 + n2 = n In good case, n1 = n2 = n/2 always Thus T(n) = O(nlogn)
  • 33. In bad case, n1 = 1 n2 = n-1 always T(n) = pn + T(n-1) = pn + p(n-1) + T(n-2) …….. = p(n + n-1 +……+1) = pn2 Thus T(n) is O(n2 ) in worst case. Average case complexity is O(nlog n)
  • 34. Quicksort performs well for large inputs, but not so good for small inputs. When the divisions become small, we can use insertion sort to sort the small divisions instead.
  • 35. General Lower Bound For Sorting Suppose a list of size k must be sorted. How many orderings can we have for k members? Depending on the values of the n numbers, we can have n! possible orders, e.g., the sorted output for a,b,c can be a,b,c, b,a,c, a,c,b, c,a,b, c,b,a, b,c,a Section 7.9
  • 36. Any comparison based sorting process can be represented as a binary decision tree. A node represents a comparison, and a branch represents the outcome of a comparison. The possible orders must be the leaves of the decision tree, i.e., depending on certain orders we will have certain comparison sequences, and we will reach a certain leaf.
  • 37. a b c b a c a c b b c a c b a c a b C1 C2 C3 C4 C5 Compare a and b C6 Different sorting algorithms will have different comparison orders, but the leaves are same for all sorting trees.
  • 38. Thus any sorting algorithm for sorting n inputs can be represented as a binary decision tree of n! leaves. Such a tree has depth at least log n! This means that any comparison based sorting algorithm need to perform at least log n! Comparisons in the worst case. log n! = log n + log (n-1) +…..+log(2) + log(1) log n + log (n-1) +……+ log (n/2) (n/2)log (n/2) = (n/2)log (n) – (n/2) = (nlog n)
  • 39. Special Case Sorting Now we will present some linear time sorting algorithms. These apply only when the input has a special structure, e.g., inputs are integers. Counting sort Radix sort Bucket sort Please follow class notes
  • 40. Counting Sort Suppose we know that the list to be sorted consists of n integers in the range 1 to M We will declare an array B with M positions. Initially all positions of B contain 0. Scan through list A A[j] is inserted in B[A[j]] B is scanned once again, and the nonzero elements are read out.
  • 41. What do we do with equal elements? B is an array of pointers. Each position in the array has 2 pointers, head and tail. Tail points to the end of a linked list, and head points to the beginning. A[j] is inserted at the end of the list B[A[j]] Again, Array B is sequentially traversed and each nonempty list is printed out. M = 10, Wish to sort 8 1 9 6 5 2 7 1 2 5 6 7 8 9 Output: 1 2 5 6 7 8 9
  • 42. M = 10, Wish to sort 8 5 1 5 9 5 6 2 7 1 2 5 6 7 8 9 Output: 1 2 5 5 5 6 7 8 9 5 5
  • 43. Complexity? Storage? Storage could be large for large ranges. Supposing we have a list of elements, such that every element has 2 fields, one an integer, and another field something else. During sorting we just look at the integer field. Supposing element a occurs before element b in the input list, and a and b have the same integer components, Can their positions be reversed in the output if counting sort is used? O(n + M) If M is O(n), then complexity is O(n) O(n+M)
  • 44. Stability Property Relative position of 2 equal elements remain the same in the output as in the input. Is merge sort stable? Quick sort? Insertion sort? Stability property of counting sort will be used in designing a new sorting algorithm, radix sort. Yes No Yes
  • 45. Radix Sort Every integer is represented by at most d digits (e.g., decimal representation) Every digit has k values (k = 10 for decimal, k = 2 for binary) There are n integers
  • 46. We would sort the integers by their least significant digits first using counting sort. Next sort it by its second least significant digit and so on. 13 22 331 27 Should it work? 013 022 331 027 Sort by least significant digit: 331 022 013 027 Sort by second least significant digit: 013 022 027 331 Sort by most significant digit: 013 022 027 331
  • 47. Note that if the most significant digit of one number a is less than that of another number b, then a comes before b. However, if the most significant digits are the same for a and b, and the difference is in the second most significant digits ( second most significant digit of b is less than that of a), then b comes before a. Why? Using stability property of counting sort
  • 48. Complexity Analysis There are d counting sorts. Each counting sort works on n numbers with the range being 0 to k. Complexity of each counting sort is O(n + k) Overall O(d(n+k)) Storage? O(n + k)
  • 49. If just counting sort were used for these numbers, what is the worst case complexity? O(kd + n) for storage and running time
  • 50. We have sorted integers in worst case linear time. Can we do anything similar for real numbers? Real numbers can be sorted in average case linear complexity under a certain assumption. assuming that the real numbers are uniformly distributed if we know that the real numbers belong to a certain range, then uniform distribution means any two equal size subranges contain roughly the same number of real numbers in the list. say numbers are between 0 to 10, then the total number of numbers between 1 to 3 is roughly equal to that between 7 and 9 Bucket Sort
  • 51. Bucket sort can be used to sort such real numbers in average case linear complexity The idea is similar to direct sequence hashing and counting sort. We have an array of n pointers. Each position has a pointer pointing to a linked list. We have a function which generates an integer from a real number, and adds the real number in the linked list at the position. We have a list of n real numbers.
  • 52. The function is chosen such that the elements at the position j of the array are less than those at position k, if j < k. The function must be such that the number of elements in each linked list is roughly the same. Thus we just sort the linked lists at every position. Next we start from position 0, print out the sorted linked list, go to position 1, print out the sorted linked (assuming it is not empty) and so on. The output is a sorted list.
  • 53. So if there are n real numbers and n positions in the array, then each linked has roughly a constant number c. Thus sorting complexity for a linked list is a constant. What is the printing complexity? Thus overall complexity is O(n)
  • 54. How do we find such a function? We can if the numbers are uniformly distributed. Suppose we know the numbers are in an interval of size M. We divide the interval M into n equal size subintervals, and name them 0, 1,….n-1. Any real number in subinterval j is added to the list at position j in the array. Given any real number, the function actually finds the number of the subinterval where it belongs.
  • 55. Firstly, why should such a function put roughly equal number of elements in each list? Second, evaluation time for the function must be a constant. That is, the function must find the interval number in constant time. Will give an example function which applies for a specific case, but in general such functions can be designed for more general cases as well. Because subranges of equal size contain roughly equal number of elements because of the uniform distribution assumption
  • 56. Suppose the real numbers are in the interval [0, 1]. The function nx gets the correct interval number for any real number x in the interval [0, 1]. n is the number of ranges we consider What is nx for any number x is the interval [0, 1/n) ? any number x is the interval [1/n, 2/n) ? any number x is the interval [2/n, 3/n) ? …………………………….. Evaluation complexity for the function? O(1)
  • 57. Thus we have divided the interval [0, 1] in n equal sized subintervals, and We have a function which returns the subinterval number of a real number in constant time. n = 10 0.9 0.01 0.39 0.47 0.82 0.31 0.9 0.01 0.39 0.47 0.82 0.31 0.9 0.01 0.31 0.47 0.82 0.39
  • 58. Procedure Summary We have a list of n real numbers. Given any number x, we add it at the list at A[nx] . We sort all of the individual linked lists. We output the sorted linked list at A[0], then at A[1], and so on. The output is sorted.
  • 59. On an average every linked list contains a constant number of elements, (as a linked list contains elements from a subinterval, and all subintervals are equal sized, and hence they have equal number of elements because of uniform distribution) Happens if we choose n roughly equal to the number of inputs. Thus sorting complexity is constant for each linked list. Thus overall sorting and printing complexity is O(n).