SlideShare a Scribd company logo
CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics
Review: Comparison Sorts Comparison sorts: O(n lg n) at best Model sort with decision tree Path down tree = execution trace of algorithm Leaves of tree = possible permutations of input Tree must have n! leaves, so O(n lg n) height
Review: Counting Sort  Counting sort:  Assumption: input is in the range 1..k Basic idea:  Count number of elements  k     each element  i Use that number to place  i  in position  k  of sorted array  No comparisons! Runs in time O(n + k) Stable sort Does not sort in place: O(n) array to hold sorted output O(k) array for scratch storage
Review: Counting Sort 1 CountingSort(A, B, k) 2 for i=1 to k 3 C[i]= 0; 4 for j=1 to n 5 C[A[j]] += 1; 6 for i=2 to k 7 C[i] = C[i] + C[i-1]; 8 for j=n downto 1 9 B[C[A[j]]] = A[j]; 10 C[A[j]] -= 1;
Review: Radix Sort How did IBM get rich originally? Answer: punched card readers for census tabulation in early 1900’s.  In particular, a  card sorter  that could sort cards into different bins Each column can be punched in 12 places Decimal digits use 10 places Problem: only one column can be sorted on at a time
Review: Radix Sort Intuitively, you might sort on the most significant digit, then the second msd, etc. Problem: lots of intermediate piles of cards (read: scratch arrays) to keep track of Key idea: sort the  least  significant digit first RadixSort(A, d) for i=1 to d StableSort(A) on digit i Example: Fig 9.3
Radix Sort Can we prove it will work? Sketch of an inductive argument (induction on the number of passes): Assume lower-order digits {j: j<i}are sorted Show that sorting next digit i leaves array correctly sorted  If two digits at position i are different, ordering numbers by that digit is correct (lower-order digits irrelevant) If they are the same, numbers are already sorted on the lower-order digits.  Since we use a stable sort, the numbers stay in the right order
Radix Sort What sort will we use to sort on digits? Counting sort is obvious choice:  Sort  n  numbers on digits that range from 1.. k Time: O( n  +  k ) Each pass over  n  numbers with  d  digits takes time O( n+k ), so total time O( dn+dk ) When  d  is constant and  k= O( n ), takes O( n ) time How many bits in a computer word?
Radix Sort Problem: sort 1 million 64-bit numbers Treat as four-digit radix 2 16  numbers Can sort in just four passes with radix sort! Compares well with typical O( n  lg  n ) comparison sort  Requires approx lg  n  = 20 operations per number being sorted So why would we ever use anything but radix sort?
Radix Sort In general, radix sort based on counting sort is Fast Asymptotically fast (i.e., O( n )) Simple to code A good choice To think about:  Can radix sort be used on floating-point numbers?
Summary: Radix Sort Radix sort: Assumption: input has  d  digits ranging from 0 to  k Basic idea:  Sort elements by digit starting with  least  significant Use a stable sort (like counting sort) for each stage Each pass over  n  numbers with  d  digits takes time O( n+k ), so total time O( dn+dk ) When  d  is constant and  k= O( n ), takes O( n ) time Fast!  Stable! Simple! Doesn’t sort in place
Bucket Sort Bucket sort Assumption: input is  n  reals from [0, 1) Basic idea:  Create  n  linked lists ( buckets ) to divide interval [0,1) into subintervals of size 1/ n Add each input element to appropriate bucket and sort buckets with insertion sort Uniform input distribution    O(1) bucket size Therefore the expected total time is O(n) These ideas will return when we study  hash tables
Order Statistics The  i th  order statistic  in a set of  n  elements is the  i th smallest element The  minimum   is thus the 1st order statistic  The  maximum  is (duh)   the  n th order statistic The  median  is the  n /2 order statistic If  n  is even, there are 2 medians How can we calculate order statistics? What is the running time?
Order Statistics How many comparisons are needed to find the minimum element in a set?  The maximum? Can we find the minimum and maximum with less than twice the cost? Yes: Walk through elements by pairs Compare each element in pair to the other Compare the largest to maximum, smallest to minimum Total cost: 3 comparisons per 2 elements = O(3n/2)
Finding Order Statistics:  The Selection Problem A more interesting problem is  selection : finding the  i th smallest element of a set  We will show: A practical randomized algorithm with O(n) expected running time A cool algorithm of theoretical interest only with O(n) worst-case running time
Randomized Selection Key idea: use partition() from quicksort But, only need to examine one subarray This savings shows up in running time: O(n) We will again use a slightly different partition than the book: q = RandomizedPartition(A, p, r)    A[q]    A[q] q p r
Randomized Selection RandomizedSelect(A, p, r, i) if (p == r) then return A[p]; q = RandomizedPartition(A, p, r) k = q - p + 1; if (i == k) then return A[q];  // not in book if (i < k) then return RandomizedSelect(A, p, q-1, i); else return RandomizedSelect(A, q+1, r, i-k);    A[q]    A[q] k q p r
Randomized Selection Analyzing  RandomizedSelect() Worst case: partition always 0:n-1 T(n)  = T(n-1) + O(n) = ??? = O(n 2 )  (arithmetic series) No better than sorting! “ Best” case: suppose a 9:1 partition T(n)  = T(9 n /10) + O(n)  = ??? = O(n) (Master Theorem, case 3) Better than sorting! What if this had been a 99:1 split?
Randomized Selection Average case For upper bound, assume  i th element always falls in larger side of partition: Let’s show that T( n ) = O( n ) by substitution What happened here?
Randomized Selection Assume T( n )     cn  for sufficiently large  c : What happened here? “ Split” the recurrence What happened here? What happened here? What happened here? The recurrence we started with Substitute T(n)    cn  for T(k)  Expand arithmetic series Multiply it out
Randomized Selection Assume T( n )     cn  for sufficiently large  c : What happened here? Subtract c/2 What happened here? What happened here? What happened here? The recurrence so far Multiply it out   Rearrange the arithmetic What we set out to prove
Worst-Case Linear-Time Selection Randomized algorithm works well in practice What follows is a worst-case linear time algorithm, really of theoretical interest only Basic idea:  Generate a good partitioning element Call this element  x
Worst-Case Linear-Time Selection The algorithm in words: 1. Divide  n  elements into groups of 5 2. Find median of each group ( How?  How long? ) 3. Use Select() recursively to find median  x  of the   n/5    medians 4. Partition the  n  elements around  x .  Let  k  = rank( x ) 5. if  (i == k)  then  return x if  (i < k)  then  use Select() recursively to find  i th smallest  element in first partition else  (i > k) use Select() recursively to find ( i-k )th smallest  element in last partition
Worst-Case Linear-Time Selection (Sketch situation on the board) How many of the 5-element medians are    x? At least 1/2 of the medians =   n/5   / 2   =   n/10  How many elements are    x? At least 3   n/10    elements For large  n ,  3   n/10       n/4  (How large?) So at least  n /4 elements     x Similarly: at least  n /4 elements     x
Worst-Case Linear-Time Selection Thus after partitioning around  x , step 5 will call Select() on at most 3 n /4 elements The recurrence is therefore:  ??? ??? ??? ??? ???  n/5       n/5 Substitute T(n) = cn Combine fractions  Express in desired form What we set out to prove
Worst-Case Linear-Time Selection Intuitively: Work at each level is a constant fraction (19/20) smaller Geometric progression! Thus the O(n) work at the root dominates
Linear-Time Median Selection Given a “black box” O(n) median algorithm, what can we do? i th order statistic:  Find median  x Partition input around  x if ( i    (n+1)/2)  recursively find  i th element of first half else find ( i  - (n+1)/2)th element in second half T(n) = T(n/2) + O(n) = O(n) Can you think of an application to sorting?
Linear-Time Median Selection Worst-case O(n lg n) quicksort Find median  x  and partition around it Recursively quicksort two halves T(n) = 2T(n/2) + O(n) = O(n lg n)
The End

More Related Content

PPT
Medians and order statistics
PDF
07 Analysis of Algorithms: Order Statistics
PPTX
Divide and conquer 1
PPTX
Medians and Order Statistics
PDF
Medians and order statistics
07 Analysis of Algorithms: Order Statistics
Divide and conquer 1
Medians and Order Statistics

What's hot (20)

PPT
3.8 quicksort
PDF
Linear sorting
PPTX
Divide and conquer
PPTX
median and order statistics
PDF
Quick sort algorithn
PPT
5.5 back tracking 02
PDF
Lecture 5 6_7 - divide and conquer and method of solving recurrences
PPTX
Algorithm big o
PPT
5.4 randomized datastructures
RTF
algorithm unit 1
PPT
Divide and conquer
PDF
220exercises2
PDF
Lecture 4 asymptotic notations
DOC
algorithm Unit 4
PPTX
Scalable k-means plus plus
PPT
Big oh Representation Used in Time complexities
PDF
3.8 quicksort
Linear sorting
Divide and conquer
median and order statistics
Quick sort algorithn
5.5 back tracking 02
Lecture 5 6_7 - divide and conquer and method of solving recurrences
Algorithm big o
5.4 randomized datastructures
algorithm unit 1
Divide and conquer
220exercises2
Lecture 4 asymptotic notations
algorithm Unit 4
Scalable k-means plus plus
Big oh Representation Used in Time complexities
Ad

Viewers also liked (8)

PPT
Algorithm
PDF
Skiena algorithm 2007 lecture09 linear sorting
PPT
lecture 25
PPT
lecture 11
PPT
Next higher number with same number of binary bits set
PDF
Sorting Algorithms
PPT
Sorting Algorithms
PPTX
Sorting algorithms
Algorithm
Skiena algorithm 2007 lecture09 linear sorting
lecture 25
lecture 11
Next higher number with same number of binary bits set
Sorting Algorithms
Sorting Algorithms
Sorting algorithms
Ad

Similar to lecture 10 (20)

PPT
lecture 9
PPT
Counting Sort Lowerbound
PPT
Cis435 week06
PPTX
Sorting2
PDF
Linear time sorting algorithms
PPTX
Data structure using c module 3
PDF
Sorting and Searching Techniques
PDF
Linear sort
PDF
Study on Sorting Algorithm and Position Determining Sort
PPTX
Data Structure and algorithms for software
PDF
Unit-1 DAA_Notes.pdf
PPT
quicksortnmsd cmz ,z m,zmm,mbfjjjjhjhfjsg
PPT
03_sorting123456789454545454545444543.ppt
PPT
03_sorting and it's types with example .ppt
PDF
Analysis and design of algorithms part2
PPTX
Linear Sorting
PPT
ee220s02lec9.ppt ghggggggggggggggggggggggg
PPT
free power point ready to download right now
PPT
MergesortQuickSort.ppt
lecture 9
Counting Sort Lowerbound
Cis435 week06
Sorting2
Linear time sorting algorithms
Data structure using c module 3
Sorting and Searching Techniques
Linear sort
Study on Sorting Algorithm and Position Determining Sort
Data Structure and algorithms for software
Unit-1 DAA_Notes.pdf
quicksortnmsd cmz ,z m,zmm,mbfjjjjhjhfjsg
03_sorting123456789454545454545444543.ppt
03_sorting and it's types with example .ppt
Analysis and design of algorithms part2
Linear Sorting
ee220s02lec9.ppt ghggggggggggggggggggggggg
free power point ready to download right now
MergesortQuickSort.ppt

More from sajinsc (20)

PPT
lecture 30
PPT
lecture 29
PPT
lecture 28
PPT
lecture 27
PPT
lecture 26
PPT
lecture 24
PPT
lecture 23
PPT
lecture 22
PPT
lecture 21
PPT
lecture 20
PPT
lecture 19
PPT
lecture 18
PPT
lecture 17
PPT
lecture 16
PPT
lecture 15
PPT
lecture 14
PPT
lecture 13
PPT
lecture 12
PPT
lecture 8
PPT
lecture 7
lecture 30
lecture 29
lecture 28
lecture 27
lecture 26
lecture 24
lecture 23
lecture 22
lecture 21
lecture 20
lecture 19
lecture 18
lecture 17
lecture 16
lecture 15
lecture 14
lecture 13
lecture 12
lecture 8
lecture 7

Recently uploaded (20)

PPTX
Lesson notes of climatology university.
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
master seminar digital applications in india
PDF
01-Introduction-to-Information-Management.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Cell Structure & Organelles in detailed.
PPTX
Institutional Correction lecture only . . .
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
RMMM.pdf make it easy to upload and study
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Lesson notes of climatology university.
Microbial diseases, their pathogenesis and prophylaxis
Supply Chain Operations Speaking Notes -ICLT Program
master seminar digital applications in india
01-Introduction-to-Information-Management.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Microbial disease of the cardiovascular and lymphatic systems
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Complications of Minimal Access Surgery at WLH
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Cell Structure & Organelles in detailed.
Institutional Correction lecture only . . .
Anesthesia in Laparoscopic Surgery in India
RMMM.pdf make it easy to upload and study
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Pharmacology of Heart Failure /Pharmacotherapy of CHF

lecture 10

  • 1. CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics
  • 2. Review: Comparison Sorts Comparison sorts: O(n lg n) at best Model sort with decision tree Path down tree = execution trace of algorithm Leaves of tree = possible permutations of input Tree must have n! leaves, so O(n lg n) height
  • 3. Review: Counting Sort Counting sort: Assumption: input is in the range 1..k Basic idea: Count number of elements k  each element i Use that number to place i in position k of sorted array No comparisons! Runs in time O(n + k) Stable sort Does not sort in place: O(n) array to hold sorted output O(k) array for scratch storage
  • 4. Review: Counting Sort 1 CountingSort(A, B, k) 2 for i=1 to k 3 C[i]= 0; 4 for j=1 to n 5 C[A[j]] += 1; 6 for i=2 to k 7 C[i] = C[i] + C[i-1]; 8 for j=n downto 1 9 B[C[A[j]]] = A[j]; 10 C[A[j]] -= 1;
  • 5. Review: Radix Sort How did IBM get rich originally? Answer: punched card readers for census tabulation in early 1900’s. In particular, a card sorter that could sort cards into different bins Each column can be punched in 12 places Decimal digits use 10 places Problem: only one column can be sorted on at a time
  • 6. Review: Radix Sort Intuitively, you might sort on the most significant digit, then the second msd, etc. Problem: lots of intermediate piles of cards (read: scratch arrays) to keep track of Key idea: sort the least significant digit first RadixSort(A, d) for i=1 to d StableSort(A) on digit i Example: Fig 9.3
  • 7. Radix Sort Can we prove it will work? Sketch of an inductive argument (induction on the number of passes): Assume lower-order digits {j: j<i}are sorted Show that sorting next digit i leaves array correctly sorted If two digits at position i are different, ordering numbers by that digit is correct (lower-order digits irrelevant) If they are the same, numbers are already sorted on the lower-order digits. Since we use a stable sort, the numbers stay in the right order
  • 8. Radix Sort What sort will we use to sort on digits? Counting sort is obvious choice: Sort n numbers on digits that range from 1.. k Time: O( n + k ) Each pass over n numbers with d digits takes time O( n+k ), so total time O( dn+dk ) When d is constant and k= O( n ), takes O( n ) time How many bits in a computer word?
  • 9. Radix Sort Problem: sort 1 million 64-bit numbers Treat as four-digit radix 2 16 numbers Can sort in just four passes with radix sort! Compares well with typical O( n lg n ) comparison sort Requires approx lg n = 20 operations per number being sorted So why would we ever use anything but radix sort?
  • 10. Radix Sort In general, radix sort based on counting sort is Fast Asymptotically fast (i.e., O( n )) Simple to code A good choice To think about: Can radix sort be used on floating-point numbers?
  • 11. Summary: Radix Sort Radix sort: Assumption: input has d digits ranging from 0 to k Basic idea: Sort elements by digit starting with least significant Use a stable sort (like counting sort) for each stage Each pass over n numbers with d digits takes time O( n+k ), so total time O( dn+dk ) When d is constant and k= O( n ), takes O( n ) time Fast! Stable! Simple! Doesn’t sort in place
  • 12. Bucket Sort Bucket sort Assumption: input is n reals from [0, 1) Basic idea: Create n linked lists ( buckets ) to divide interval [0,1) into subintervals of size 1/ n Add each input element to appropriate bucket and sort buckets with insertion sort Uniform input distribution  O(1) bucket size Therefore the expected total time is O(n) These ideas will return when we study hash tables
  • 13. Order Statistics The i th order statistic in a set of n elements is the i th smallest element The minimum is thus the 1st order statistic The maximum is (duh) the n th order statistic The median is the n /2 order statistic If n is even, there are 2 medians How can we calculate order statistics? What is the running time?
  • 14. Order Statistics How many comparisons are needed to find the minimum element in a set? The maximum? Can we find the minimum and maximum with less than twice the cost? Yes: Walk through elements by pairs Compare each element in pair to the other Compare the largest to maximum, smallest to minimum Total cost: 3 comparisons per 2 elements = O(3n/2)
  • 15. Finding Order Statistics: The Selection Problem A more interesting problem is selection : finding the i th smallest element of a set We will show: A practical randomized algorithm with O(n) expected running time A cool algorithm of theoretical interest only with O(n) worst-case running time
  • 16. Randomized Selection Key idea: use partition() from quicksort But, only need to examine one subarray This savings shows up in running time: O(n) We will again use a slightly different partition than the book: q = RandomizedPartition(A, p, r)  A[q]  A[q] q p r
  • 17. Randomized Selection RandomizedSelect(A, p, r, i) if (p == r) then return A[p]; q = RandomizedPartition(A, p, r) k = q - p + 1; if (i == k) then return A[q]; // not in book if (i < k) then return RandomizedSelect(A, p, q-1, i); else return RandomizedSelect(A, q+1, r, i-k);  A[q]  A[q] k q p r
  • 18. Randomized Selection Analyzing RandomizedSelect() Worst case: partition always 0:n-1 T(n) = T(n-1) + O(n) = ??? = O(n 2 ) (arithmetic series) No better than sorting! “ Best” case: suppose a 9:1 partition T(n) = T(9 n /10) + O(n) = ??? = O(n) (Master Theorem, case 3) Better than sorting! What if this had been a 99:1 split?
  • 19. Randomized Selection Average case For upper bound, assume i th element always falls in larger side of partition: Let’s show that T( n ) = O( n ) by substitution What happened here?
  • 20. Randomized Selection Assume T( n )  cn for sufficiently large c : What happened here? “ Split” the recurrence What happened here? What happened here? What happened here? The recurrence we started with Substitute T(n)  cn for T(k) Expand arithmetic series Multiply it out
  • 21. Randomized Selection Assume T( n )  cn for sufficiently large c : What happened here? Subtract c/2 What happened here? What happened here? What happened here? The recurrence so far Multiply it out Rearrange the arithmetic What we set out to prove
  • 22. Worst-Case Linear-Time Selection Randomized algorithm works well in practice What follows is a worst-case linear time algorithm, really of theoretical interest only Basic idea: Generate a good partitioning element Call this element x
  • 23. Worst-Case Linear-Time Selection The algorithm in words: 1. Divide n elements into groups of 5 2. Find median of each group ( How? How long? ) 3. Use Select() recursively to find median x of the  n/5  medians 4. Partition the n elements around x . Let k = rank( x ) 5. if (i == k) then return x if (i < k) then use Select() recursively to find i th smallest element in first partition else (i > k) use Select() recursively to find ( i-k )th smallest element in last partition
  • 24. Worst-Case Linear-Time Selection (Sketch situation on the board) How many of the 5-element medians are  x? At least 1/2 of the medians =  n/5  / 2  =  n/10  How many elements are  x? At least 3  n/10  elements For large n , 3  n/10   n/4 (How large?) So at least n /4 elements  x Similarly: at least n /4 elements  x
  • 25. Worst-Case Linear-Time Selection Thus after partitioning around x , step 5 will call Select() on at most 3 n /4 elements The recurrence is therefore: ??? ??? ??? ??? ???  n/5   n/5 Substitute T(n) = cn Combine fractions Express in desired form What we set out to prove
  • 26. Worst-Case Linear-Time Selection Intuitively: Work at each level is a constant fraction (19/20) smaller Geometric progression! Thus the O(n) work at the root dominates
  • 27. Linear-Time Median Selection Given a “black box” O(n) median algorithm, what can we do? i th order statistic: Find median x Partition input around x if ( i  (n+1)/2) recursively find i th element of first half else find ( i - (n+1)/2)th element in second half T(n) = T(n/2) + O(n) = O(n) Can you think of an application to sorting?
  • 28. Linear-Time Median Selection Worst-case O(n lg n) quicksort Find median x and partition around it Recursively quicksort two halves T(n) = 2T(n/2) + O(n) = O(n lg n)