SlideShare a Scribd company logo
Lecture 6:
           Hashing
         Steven Skiena

Department of Computer Science
 State University of New York
 Stony Brook, NY 11794–4400

http://www.cs.sunysb.edu/∼skiena
Dictionary / Dynamic Set Operations
Perhaps the most important class of data structures maintain
a set of items, indexed by keys.
 • Search(S,k) – A query that, given a set S and a key value
   k, returns a pointer x to an element in S such that key[x]
   = k, or nil if no such element belongs to S.
 • Insert(S,x) – A modifying operation that augments the set
   S with the element x.
 • Delete(S,x) – Given a pointer x to an element in the set S,
   remove x from S. Observe we are given a pointer to an
   element x, not a key value.
• Min(S), Max(S) – Returns the element of the totally
   ordered set S which has the smallest (largest) key.
 • Next(S,x), Previous(S,x) – Given an element x whose key
   is from a totally ordered set S, returns the next largest
   (smallest) element in S, or NIL if x is the maximum
   (minimum) element.
There are a variety of implementations of these dictionary
operations, each of which yield different time bounds for
various operations.
Problem of the Day
You are given the task of reading in n numbers and then
printing them out in sorted order. Suppose you have access
to a balanced dictionary data structure, which supports each
of the operations search, insert, delete, minimum, maximum,
successor, and predecessor in O(log n) time.
 • Explain how you can use this dictionary to sort in
   O(n log n) time using only the following abstract opera-
   tions: minimum, successor, insert, search.
• Explain how you can use this dictionary to sort in
  O(n log n) time using only the following abstract opera-
  tions: minimum, insert, delete, search.




• Explain how you can use this dictionary to sort in
  O(n log n) time using only the following abstract opera-
  tions: insert and in-order traversal.
Hash Tables
Hash tables are a very practical way to maintain a dictionary.
The idea is simply that looking an item up in an array is Θ(1)
once you have its index.
A hash function is a mathematical function which maps keys
to integers.
Collisions
Collisions are the set of keys mapped to the same bucket.
If the keys are uniformly distributed, then each bucket should
contain very few keys!
The resulting short lists are easily searched!
              0   1   2   3   4   5   6   7   8   9   10   11
Hash Functions
It is the job of the hash function to map keys to integers. A
good hash function:
1. Is cheap to evaluate
2. Tends to use all positions from 0 . . . M with uniform
   frequency.
The first step is usually to map the key to a big integer, for
example
                   keylength
              h=               128i × char(key[i])
                     i=0
Modular Arithmetic
This large number must be reduced to an integer whose size
is between 1 and the size of our hash table.
One way is by h(k) = k mod M , where M is best a large
prime not too close to 2i − 1, which would just mask off the
high bits.
This works on the same principle as a roulette wheel!
Bad Hash Functions
The first three digits of the Social Security Number
      0     1    2    3    4    5     6   7     8     9
Good Hash Functions
The last three digits of the Social Security Number
      0     1    2    3    4     5    6    7    8     9
Performance on Set Operations
With either chaining or open addressing:
 • Search - O(1) expected, O(n) worst case
 • Insert - O(1) expected, O(n) worst case
 • Delete - O(1) expected, O(n) worst case
 • Min, Max and Predecessor, Successor Θ(n + m) expected
   and worst case
Pragmatically, a hash table is often the best data structure
to maintain a dictionary. However, the worst-case time is
unpredictable.
The best worst-case bounds come from balanced binary
trees.
Substring Pattern Matching
Input: A text string t and a pattern string p.
Problem: Does t contain the pattern p as a substring, and if
so where?
E.g: Is Skiena in the Bible?
Brute Force Search
The simplest algorithm to search for the presence of pattern
string p in text t overlays the pattern string at every position in
the text, and checks whether every pattern character matches
the corresponding text character.
This runs in O(nm) time, where n = |t| and m = |p|.
String Matching via Hashing
Suppose we compute a given hash function on both the
pattern string p and the m-character substring starting from
the ith position of t.
If these two strings are identical, clearly the resulting hash
values will be the same.
If the two strings are different, the hash values will almost
certainly be different.
These false positives should be so rare that we can easily
spend the O(m) time it take to explicitly check the identity
of two strings whenever the hash values agree.
The Catch
This reduces string matching to n − m + 2 hash value
computations (the n − m + 1 windows of t, plus one hash
of p), plus what should be a very small number of O(m) time
verification steps.
The catch is that it takes O(m) time to compute a hash func-
tion on an m-character string, and O(n) such computations
seems to leave us with an O(mn) algorithm again.
The Trick
Look closely at our string hash function, applied to the m
characters starting from the jth position of string S:
                        m−1
            H(S, j) =          αm−(i+1) × char(si+j )
                         i=0
A little algebra reveals that
     H(S, j + 1) = (H(S, j) − αm−1char(sj ))α + sj+m
Thus once we know the hash value from the j position, we
can find the hash value from the (j + 1)st position for the
cost of two multiplications, one addition, and one subtraction.
This can be done in constant time.
Hashing, Hashing, and Hashing
Udi Manber says that the three most important algorithms at
Yahoo are hashing, hashing, and hashing.
Hashing has a variety of clever applications beyond just
speeding up search, by giving you a short but distinctive
representation of a larger document.
 • Is this new document different from the rest in a large
   corpus? – Hash the new document, and compare it to
   the hash codes of corpus.
 • How can I convince you that a file isn’t changed? – Check
   if the cryptographic hash code of the file you give me
   today is the same as that of the original. Any changes
   to the file will change the hash code.

More Related Content

PPT
Concept of hashing
PPT
Hashing
PPT
Hashing
PPTX
Quadratic probing
PPTX
Hashing
PDF
Application of hashing in better alg design tanmay
PPT
Hashing PPT
PPT
Hashing
Concept of hashing
Hashing
Hashing
Quadratic probing
Hashing
Application of hashing in better alg design tanmay
Hashing PPT
Hashing

What's hot (20)

PPTX
Hashing in datastructure
PDF
Algorithm chapter 7
PDF
Hashing Algorithm
PPT
Data Structure and Algorithms Hashing
PPTX
Hashing Techniques in Data Structures Part2
PPT
Hash tables
PPT
18 hashing
PPTX
Rehashing
ZIP
Hashing
PPTX
Hashing Technique In Data Structures
PPT
4.4 hashing
PDF
08 Hash Tables
PPT
Hash table
PPT
Hashing
PPT
Hashing
PPT
Hash presentation
PPT
PDF
Sienna 9 hashing
Hashing in datastructure
Algorithm chapter 7
Hashing Algorithm
Data Structure and Algorithms Hashing
Hashing Techniques in Data Structures Part2
Hash tables
18 hashing
Rehashing
Hashing
Hashing Technique In Data Structures
4.4 hashing
08 Hash Tables
Hash table
Hashing
Hashing
Hash presentation
Sienna 9 hashing
Ad

Viewers also liked (8)

PPTX
Fcv rep a_berg
PPTX
Fcv rep tenenbaum
PDF
Fcv appli science_golland
PDF
Fcv scene efros
PPTX
Fcv scene lazebnik
PPT
Fcv taxo chellappa
PDF
Fcv acad ind_martin
PDF
Fcv hum mach_belongie
Fcv rep a_berg
Fcv rep tenenbaum
Fcv appli science_golland
Fcv scene efros
Fcv scene lazebnik
Fcv taxo chellappa
Fcv acad ind_martin
Fcv hum mach_belongie
Ad

Similar to Skiena algorithm 2007 lecture06 sorting (20)

PPTX
hashing in data strutures advanced in languae java
PPT
4.4 hashing02
PPT
Ch17 Hashing
PDF
hashing.pdf
PPTX
Hash tables
PPTX
Unit viii searching and hashing
PPTX
Hashing algorithms and its uses
PDF
Algorithms notes tutorials duniya
PPTX
hashing1.pptx Data Structures and Algorithms
PPT
Analysis Of Algorithms - Hashing
PPTX
session 15 hashing.pptx
PPT
Hashing in Data Structure and analysis of Algorithms
PPTX
Hashing and Binary Search Tree powerp.pptx
PPTX
Presentation.pptx
PDF
Data Structures Design Notes.pdf
PPT
Hashing and collision for database systems
PDF
03.01 hash tables
PPTX
Unit 8 searching and hashing
hashing in data strutures advanced in languae java
4.4 hashing02
Ch17 Hashing
hashing.pdf
Hash tables
Unit viii searching and hashing
Hashing algorithms and its uses
Algorithms notes tutorials duniya
hashing1.pptx Data Structures and Algorithms
Analysis Of Algorithms - Hashing
session 15 hashing.pptx
Hashing in Data Structure and analysis of Algorithms
Hashing and Binary Search Tree powerp.pptx
Presentation.pptx
Data Structures Design Notes.pdf
Hashing and collision for database systems
03.01 hash tables
Unit 8 searching and hashing

More from zukun (20)

PDF
My lyn tutorial 2009
PDF
ETHZ CV2012: Tutorial openCV
PDF
ETHZ CV2012: Information
PDF
Siwei lyu: natural image statistics
PDF
Lecture9 camera calibration
PDF
Brunelli 2008: template matching techniques in computer vision
PDF
Modern features-part-4-evaluation
PDF
Modern features-part-3-software
PDF
Modern features-part-2-descriptors
PDF
Modern features-part-1-detectors
PDF
Modern features-part-0-intro
PDF
Lecture 02 internet video search
PDF
Lecture 01 internet video search
PDF
Lecture 03 internet video search
PDF
Icml2012 tutorial representation_learning
PPT
Advances in discrete energy minimisation for computer vision
PDF
Gephi tutorial: quick start
PDF
EM algorithm and its application in probabilistic latent semantic analysis
PDF
Object recognition with pictorial structures
PDF
Iccv2011 learning spatiotemporal graphs of human activities
My lyn tutorial 2009
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Information
Siwei lyu: natural image statistics
Lecture9 camera calibration
Brunelli 2008: template matching techniques in computer vision
Modern features-part-4-evaluation
Modern features-part-3-software
Modern features-part-2-descriptors
Modern features-part-1-detectors
Modern features-part-0-intro
Lecture 02 internet video search
Lecture 01 internet video search
Lecture 03 internet video search
Icml2012 tutorial representation_learning
Advances in discrete energy minimisation for computer vision
Gephi tutorial: quick start
EM algorithm and its application in probabilistic latent semantic analysis
Object recognition with pictorial structures
Iccv2011 learning spatiotemporal graphs of human activities

Recently uploaded (20)

PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Empathic Computing: Creating Shared Understanding
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Cloud computing and distributed systems.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Machine learning based COVID-19 study performance prediction
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Empathic Computing: Creating Shared Understanding
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
MYSQL Presentation for SQL database connectivity
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Spectral efficient network and resource selection model in 5G networks
NewMind AI Weekly Chronicles - August'25 Week I
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Understanding_Digital_Forensics_Presentation.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Cloud computing and distributed systems.
20250228 LYD VKU AI Blended-Learning.pptx
Machine learning based COVID-19 study performance prediction

Skiena algorithm 2007 lecture06 sorting

  • 1. Lecture 6: Hashing Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794–4400 http://www.cs.sunysb.edu/∼skiena
  • 2. Dictionary / Dynamic Set Operations Perhaps the most important class of data structures maintain a set of items, indexed by keys. • Search(S,k) – A query that, given a set S and a key value k, returns a pointer x to an element in S such that key[x] = k, or nil if no such element belongs to S. • Insert(S,x) – A modifying operation that augments the set S with the element x. • Delete(S,x) – Given a pointer x to an element in the set S, remove x from S. Observe we are given a pointer to an element x, not a key value.
  • 3. • Min(S), Max(S) – Returns the element of the totally ordered set S which has the smallest (largest) key. • Next(S,x), Previous(S,x) – Given an element x whose key is from a totally ordered set S, returns the next largest (smallest) element in S, or NIL if x is the maximum (minimum) element. There are a variety of implementations of these dictionary operations, each of which yield different time bounds for various operations.
  • 4. Problem of the Day You are given the task of reading in n numbers and then printing them out in sorted order. Suppose you have access to a balanced dictionary data structure, which supports each of the operations search, insert, delete, minimum, maximum, successor, and predecessor in O(log n) time. • Explain how you can use this dictionary to sort in O(n log n) time using only the following abstract opera- tions: minimum, successor, insert, search.
  • 5. • Explain how you can use this dictionary to sort in O(n log n) time using only the following abstract opera- tions: minimum, insert, delete, search. • Explain how you can use this dictionary to sort in O(n log n) time using only the following abstract opera- tions: insert and in-order traversal.
  • 6. Hash Tables Hash tables are a very practical way to maintain a dictionary. The idea is simply that looking an item up in an array is Θ(1) once you have its index. A hash function is a mathematical function which maps keys to integers.
  • 7. Collisions Collisions are the set of keys mapped to the same bucket. If the keys are uniformly distributed, then each bucket should contain very few keys! The resulting short lists are easily searched! 0 1 2 3 4 5 6 7 8 9 10 11
  • 8. Hash Functions It is the job of the hash function to map keys to integers. A good hash function: 1. Is cheap to evaluate 2. Tends to use all positions from 0 . . . M with uniform frequency. The first step is usually to map the key to a big integer, for example keylength h= 128i × char(key[i]) i=0
  • 9. Modular Arithmetic This large number must be reduced to an integer whose size is between 1 and the size of our hash table. One way is by h(k) = k mod M , where M is best a large prime not too close to 2i − 1, which would just mask off the high bits. This works on the same principle as a roulette wheel!
  • 10. Bad Hash Functions The first three digits of the Social Security Number 0 1 2 3 4 5 6 7 8 9
  • 11. Good Hash Functions The last three digits of the Social Security Number 0 1 2 3 4 5 6 7 8 9
  • 12. Performance on Set Operations With either chaining or open addressing: • Search - O(1) expected, O(n) worst case • Insert - O(1) expected, O(n) worst case • Delete - O(1) expected, O(n) worst case • Min, Max and Predecessor, Successor Θ(n + m) expected and worst case Pragmatically, a hash table is often the best data structure to maintain a dictionary. However, the worst-case time is unpredictable. The best worst-case bounds come from balanced binary trees.
  • 13. Substring Pattern Matching Input: A text string t and a pattern string p. Problem: Does t contain the pattern p as a substring, and if so where? E.g: Is Skiena in the Bible?
  • 14. Brute Force Search The simplest algorithm to search for the presence of pattern string p in text t overlays the pattern string at every position in the text, and checks whether every pattern character matches the corresponding text character. This runs in O(nm) time, where n = |t| and m = |p|.
  • 15. String Matching via Hashing Suppose we compute a given hash function on both the pattern string p and the m-character substring starting from the ith position of t. If these two strings are identical, clearly the resulting hash values will be the same. If the two strings are different, the hash values will almost certainly be different. These false positives should be so rare that we can easily spend the O(m) time it take to explicitly check the identity of two strings whenever the hash values agree.
  • 16. The Catch This reduces string matching to n − m + 2 hash value computations (the n − m + 1 windows of t, plus one hash of p), plus what should be a very small number of O(m) time verification steps. The catch is that it takes O(m) time to compute a hash func- tion on an m-character string, and O(n) such computations seems to leave us with an O(mn) algorithm again.
  • 17. The Trick Look closely at our string hash function, applied to the m characters starting from the jth position of string S: m−1 H(S, j) = αm−(i+1) × char(si+j ) i=0 A little algebra reveals that H(S, j + 1) = (H(S, j) − αm−1char(sj ))α + sj+m Thus once we know the hash value from the j position, we can find the hash value from the (j + 1)st position for the cost of two multiplications, one addition, and one subtraction. This can be done in constant time.
  • 18. Hashing, Hashing, and Hashing Udi Manber says that the three most important algorithms at Yahoo are hashing, hashing, and hashing. Hashing has a variety of clever applications beyond just speeding up search, by giving you a short but distinctive representation of a larger document. • Is this new document different from the rest in a large corpus? – Hash the new document, and compare it to the hash codes of corpus. • How can I convince you that a file isn’t changed? – Check if the cryptographic hash code of the file you give me today is the same as that of the original. Any changes to the file will change the hash code.