SlideShare a Scribd company logo
HASHING
• How to store Employee records, which consists of Telephone no.,
Name, City, Department, Salary, etc…
Telephone Name City Dept. Salary
2
Solution
• Use an Array:
• Search takes a linear time.
• If stored in sorted order this search can be done in O(log n) using
binary search but then insertion and deletion becomes costly because
we have to maintain the sorted order.
3
Solution
• Use an Linked List:
• Search takes again linear time to search for particular record as se need to go
through all the records until the required one is found.
• Use balanced BST to store records:
• Insertion takes O(log n).
• Search takes O(log n).
• Deletion takes O(log n).
4
Solution
• Create a Direct Access Table:
• Insertion takes O(1).
• Search takes O(1).
• Deletion takes O(1).
5
• Limitations:
• 1] Size of Table: m * 10^n
• Where, m = size of pointer to the record
n = digits in the telephone number
Telephone Memory Location
(Pointer)
…………………….
9856234178 0x1…
8695234512 0x2…
…………………….
Name City Dept …….
Tejas Pune IT
Name City Dept …….
Rahul Mumbai HR
• Limitations:
• 2] Integer may not hold the
size of n digits
6
Introduction to Hashing
• Provides O(1) time on average for insert, search and delete.
• Find items with keys matching a given search key
• Given an array A, containing n keys, and a search key x, find the
index i such as x=A[i]
• Example of Record
KEY Other Data
7
Continue..
• Hashing is an important Data Structure which is designed to use a
special function called the Hash function.
• Hash function is used to map a given value with a particular key for
faster access of elements.
• The efficiency of mapping depends of the efficiency of the hash
function used.
• Some examples of how hashing is used in our lives include:
• In universities, each student is assigned a unique roll number that can be
used to retrieve information about them.
• In libraries, each book is assigned a unique number that can be used to
determine information about the book, such as its exact position in the
library or the users it has been issued to etc.
8
Continue..
• Assume that you have an object and you want to assign a key to it
to make searching easy.
• To store the key/value pair, you can use a simple array like a data
structure where keys (integers) can be used directly as an index to
store values.
• However, in cases where the keys are large and cannot be used
directly as an index, you should use hashing.
9
Continue..
• In hashing, large keys are converted into small keys by using hash
functions.
• The values are then stored in a data structure called hash table.
• The idea of hashing is to distribute entries (key/value pairs)
uniformly across an array.
• Each element is assigned a key (converted key).
• By using that key you can access the element in O(1) time. Using
the key, the algorithm (hash function) computes an index that
suggests where an entry can be found or inserted.
10
Important Keywords:
• Hash Table:
• A data structure used for storing and retrieving data quickly.
Where each entry in hash table is made using hash function.
• Hash Function:
• It maps a big number or string into a small integer that can be
used as index in hash table.
• Used to place data in hash table and also to retrieve data
from hash table.
• A function that can be used to map a data set of an arbitrary
size to a data set of a fixed size, which falls into the hash
table.
• The values returned by a hash function are called hash values,
hash codes, hash sums, or simply hashes.
• Bucket:
• Each position of hash table is called bucket.
• Collision:
• It is a situation in which hash function returns the same
address for more than one record.
0 25
1 Bucket 16
2 42
3 33
4 54
Given: 25, 33, 54, 16, 42 and table
of size 5.
Hash Function is KEY mod 5
Key mod 5
25 mod 5 = 0
33 mod 5 = 3
54 mod 5 = 4
16 mod 5 = 1
42 mod 5 = 2
30 mod 5 = 0
This is collision.
Hash Table
11
Important Keywords:
• Probe:
• Each calculation of an address and test for success
is known as probe.
• Synonym:
• The set of keys that has the same location are
called synonyms.
• Ex: 25 and 30 are the synonyms.
• Overflow:
• When hash table becomes full and new record
needs to be inserted and resulting into more
collisions the it is called overflow.
• Perfect hash function:
• It is a function that maps distinct key elements into
the hash table with no collisions.
0 25
1 16
2 42
3 33
4 54
Entry of 25, 33, 54, 16, 42 in a table
of size 5 with probe 5
30
Where 30 mod 5 = 0
is a collision, synonym for 25 and it
also shows the overflow situation
Hash Table
Consider 25, 33, 54, 16, 42, 30.
Show Perfect Hash Function
12
• To achieve a good hashing mechanism, It is important to
have a good hash function with the following basic
requirements:
• Easy to compute
• Uniform distribution
• Less Collision
13
Important Keywords:
14
Types of Hash Functions
• Division Method
• Multiplication Method
• Extraction Method
• Mid Square Method
• Folding and Universal Method
15
Division Method
• It depends upon the remainder of division.
• Typically the divisor is table length.
• Ex: 54, 72, 89, 37 is to be placed in hash table & if table size 10.
0
1
2
3
4
5
6
7
8
9
Hash Function
h(key) = record % table_size
54 % 10 = 4
72 % 10 = 2
89 % 10 = 9
37 % 10 = 7
54
72
89
37
Disadvantage:
The consecutive keys map to
consecutive hash values in the
hash table. This leads to a poor
performance.
16
Multiplication Method
17
Extraction Method
• Some digits are extracted from the key to form the address location in
hash table.
• Ex:
Suppose 1st, 3rd and 4th digits from left are selected for hash key.
4 9 7 8 2 4
478
At 478 location in the
hash table of size 1000
the key can be stored.
18
Mid Square Method
• Square the key.
• Then extract middle part of the result.
• That will be the location of key element in hash table.
• Ex:
A] 3111 = Square(3111) B] 16 = Square(16)
9678321
At 783 location in the
hash table of size 1000
the key can be stored.
256
At 56 location in
the hash table.
19
Folding Method
• Two folding techniques:
• A] Fold Shift:
The key is divided into separate
parts based on size of required address.
Finally add those parts.
• Ex: 345678123
• And required address is of 3 digits.
• B] Fold Boundary:
The key is divided into separate
parts based on size of required address.
Then left & right parts are folded and
then add those parts.
• Ex: 345678123
345678123
345
678
123
-------
1146
345678123
543
678
321
-------
1542
20
Universal Hashing
• In this technique randomized algorithm is used to select the hash
function at random from family of hash functions with the help of
certain mathematical properties.
• Let U is the set of Universal Keys and H be the finite collection of Hash
functions mapping U into {0, 1, …….., m-1}.
• Say there is collision or your hash function perform badly for input
key or a pair (x, y) i.e. { (x,y) | h(x)=h(y)}.
U
0
.
.
m-1
Bad portion
of key
21
Universal Hashing
• The fundamental weakness of hash functions is collision so to avoid
that what we do we can choose the Universal hashing where the hash
function randomly at the runtime.
• Lets consider H = set of all hash functions h from U to {0,1,…,m-1}.
{ h | h : U  (0,1,…,m-1) }
H
Bad portion
22
Universal Hashing
23
24
Properties of Good Hash Functions
• Rules for choosing good hash function:
1. The hash function should be simple to compute.
2. Number of collisions should be less ideally no collision should occur (perfect
hash function).
3. Hash function should produce such keys which will get distributed uniformly
over an array.
4. The hash function should depend on every bit of the key and not on the
extracted portion of the key.
25
Collision Resolution Strategies
26
Separate chaining (open hashing)
• The open hashing is also known as separate chaining in which
collisions are stored outside the table.
• Separate chaining is one of the most commonly used collision
resolution techniques.
• It is usually implemented using linked lists. In separate chaining, each
element of the hash table is a linked list.
• To store an element in the hash table you must insert it into a specific
linked list.
• If there is any collision (i.e. two different elements have same hash
value) then store both the elements in the same linked list.
27
Input:
81, 0, 64, 25, 1, 36, 4, 49, 16, 9.
28
Advantages:
1. Simple to implement.
2. Hash table never fills up, we can always add more elements to chain.
3. Less sensitive to the hash function.
4. It is mostly used when it is unknown how many and how frequently
keys may be inserted or deleted
Disadvantages:
1. Wastage of space.
2. Cache performance of chaining is not good as keys are stored using
linked list.
3. If the chain becomes long then search time can become O(n) in worst
case.
4. Uses extra space for links.
29
Open Addressing
• A] Linear Probing with chaining (without
replacement):
• A separate chain table is maintained for
colliding data.
• The address of colliding data is stored with
first colliding element in chain table.
• We can place element at next available
bucket or slot and will keep track of that in
chain column.
Ex: 131, 3, 4, 21, 61, 6, 71, 8, 9.
Index Data Chain
0 -1
1 -1
2 -1
3 -1
4 -1
5 -1
6 -1
7 -1
8 -1
9 -1
131
3
4
21 mod 10 = 1 Collision
21
2
61 mod 10 = 1 Collision
61
5
6
71
7
8
9
71 mod 10 = 1 Collision
-1
Index Data Chain
0 -1 -1
1 131 2
2 21 5
3 3 -1
4 4 -1
5 61 7
6 6 -1
7 71 -1
8 8 -1
9 9 -1
• Drawback: Finding the next empty location and that results into the element actually belonging to
that empty location cannot obtain its location.
30
Linear Probing with chaining (without replacement)
Ex: Consider elements as 21, 31, 41, 2, 6, 5, 7, 55 with
table size of 10.
Index Data Chain
0 -1
1 -1
2 -1
3 -1
4 -1
5 -1
6 -1
7 -1
8 -1
9 -1
21
41
2
31
2
5
3
6
-1
Index Data Chain
0 -1 -1
1 21 2
2 31 3
3 41 4
4 2 -1
5 5 8
6 6 -1
7 7 -1
8 55 -1
9 -1 -1
7
55
-1
4
8
31
B]Linear Probing with chaining (with replacement)
• Ex: 131, 21, 31, 4, 5, 2.
Index Data Chain
0 -1
1 -1
2 -1
3 -1
4 -1
5 -1
6 -1
7 -1
8 -1
9 -1
131
31
4
21 mod 10 = 1 Collision
21
2
31 mod 10 = 1 Collision
5
3
2 mod 10 = 2
Index Data Chain
0 -1
1 131 2
2
3 31 -1
4 4 -1
5 5 -1
6
7 -1
8 -1
9 -1
Replace 21 by 2 and
accordingly chain table will
be updated.
2 21 3
21 3
-1
6
Index Data Chain
0 -1 -1
1 131 6
2 2 -1
3 31 -1
4 4 -1
5 5 -1
6 21 3
7 -1 -1
8 -1 -1
9 -1 -1
32
B]Linear Probing with chaining (with replacement)
Ex: 131, 3, 4, 21, 61, 6, 71, 16, 85.
Index Data Chain
0 -1
1 131 2
2 21 5
3 3 -1
4 4 -1
5 61 7
6 6 -1
7 71 -1
8 -1
9 -1
16 mod 10 = 6 85 mod 10 = 5
Index Data Chain
0 -1
1 131 2
2 21 5
3 3 -1
4 4 -1
5 61 7
6 6 8
7 71 -1
8 16 -1
9 -1
Index Data Chain
0 -1
1 131 2
2 21 5
3 3 -1
4 4 -1
5 85 -1
6 6 8
7 71 -1
8 16 -1
9 61 7
131, 3, 4, 21, 61, 6, 71
33
Quadratic Probing
• Problem of clustering
• Accumulation of Consecutive cells n hash table
• When we want to add element we have to go for
many probes means need to search for more
elements to put element in table.
Index Data
0 79
1 41
2 22
3 32
4 44
5 33
6
7
8 18
9 59
• Solution: Quadratic Probing
34
Clustering Example
• Ex: 2, 22, 12, 42, 32
Index Data Chain
0 -1
1 -1
2 2 3
3 22 4
4 12 5
5 42 6
6 32 -1
7 -1
8 -1
9 -1
Index Data
0
1
2 2
3
4
5
6
7
8
9
[Hash(key)+i^2] mod M
(2 + 0^2)mod 10 = 2
(2 + 1^2)mod 10 = 3
22
12
(2 + 2^2)mod 10 = 6
(2 + 3^2)mod 10 = 11  1
42
(2 + 4^2)mod 10 = 18  8
32
35
Quadratic Probing
• Example: 37, 90, 55, 22, 11, 17, 49, 87 with m = 10. Index Data
0 90
1 11
2 22
3
4
5 55
6
7 37
8
9
17
49
87
36
Clustering Example
• Ex: 2, 12, 22, 32, 42, 52, 62
Index Data
0
1 32
2 2
3 12
4
5
6 22
7 52
8 42
9
[Hash(key)+i^2] mod M
(2 + 6^2)mod 10 = 38  8
(2 + 4^2)mod 10 = 18  8
(2 + 7^2)mod 10 = 51  1
(2 + 8^2)mod 10 = 66  6
(2 + 9^2)mod 10 = 83  3
(2 + 10^2)mod 10 = 102  2
This is known as secondary
clustering.
Here we are unable to add
an element in quadrating
probing because the index
calculated is never empty.
37
Double Hashing
• Second hash function is applied to the key when collision occurs.
38
Example
K H1 H2
18
41
22
44
5 7 - (18 % 7) = 7 - 4 = 3
2 1
9 6
5 5
39
Example
K H1 H2
18 5 3
41 2 1
22 9 6
44 5 5
Index Data
0
1
2
3
4
5
6
7
8
9
10
11
12
18
For 41 = 2 + 0 * 1 = 2
41
For 22 = 9 + 0 * 6 = 9 22
For 44 = 5 + 0 * 5 = 5
For 44 = 5 + 1 * 5 = 10
42
40
Home Work
41

More Related Content

PPTX
Priority Queue in Data Structure
PPTX
daa-unit-3-greedy method
PPT
Sum of subsets problem by backtracking 
PPTX
Knowledge Representation, Inference and Reasoning
PPTX
Binary Search Tree
PDF
Searching and Sorting Techniques in Data Structure
PPTX
Process synchronization in Operating Systems
PPTX
Mathematical Analysis of Non-Recursive Algorithm.
Priority Queue in Data Structure
daa-unit-3-greedy method
Sum of subsets problem by backtracking 
Knowledge Representation, Inference and Reasoning
Binary Search Tree
Searching and Sorting Techniques in Data Structure
Process synchronization in Operating Systems
Mathematical Analysis of Non-Recursive Algorithm.

What's hot (20)

PPT
PDF
Automata theory
PPTX
Basic theorems and properties of boolean algebra
PPTX
Doubly Linked List
PPTX
Hamming code system
PDF
sparse matrix in data structure
PPTX
heap Sort Algorithm
PPTX
Linear Search Presentation
PPTX
Hashing Technique In Data Structures
PPTX
Graph coloring using backtracking
PPTX
A Nutshell On Convolutional Codes (Representations)
PPTX
Recursive Descent Parsing
PPTX
Strings in C
PPTX
Circular link list.ppt
PPT
Principles of soft computing-Associative memory networks
PPTX
Graph traversals in Data Structures
PDF
L8 fuzzy relations contd.
PPTX
A Role of Lexical Analyzer
PPTX
4. block coding
PDF
DATA STRUCTURE AND ALGORITHM FULL NOTES
Automata theory
Basic theorems and properties of boolean algebra
Doubly Linked List
Hamming code system
sparse matrix in data structure
heap Sort Algorithm
Linear Search Presentation
Hashing Technique In Data Structures
Graph coloring using backtracking
A Nutshell On Convolutional Codes (Representations)
Recursive Descent Parsing
Strings in C
Circular link list.ppt
Principles of soft computing-Associative memory networks
Graph traversals in Data Structures
L8 fuzzy relations contd.
A Role of Lexical Analyzer
4. block coding
DATA STRUCTURE AND ALGORITHM FULL NOTES
Ad

Similar to Hashing techniques, Hashing function,Collision detection techniques (20)

PPTX
Data Structures-Topic-Hashing, Collision
PDF
LECT 10, 11-DSALGO(Hashing).pdf
PPTX
Lec12-Hash-Tables-27122022-125641pm.pptx
PPTX
Working with python Nice PPT must try very good
PPTX
hashing in data structures and its applications
PPTX
Hashing .pptx
PPTX
HASHING IS NOT YASH IT IS HASH.pptx
PPTX
Unit viii searching and hashing
PPTX
Hashing and Collision Advanced data structure and algorithm
PPT
Hashing Techniques in Data Strucures and Algorithm
PDF
hashtableeeeeeeeeeeeeeeeeeeeeeeeeeee.pdf
PPTX
Hashing And Hashing Tables
PDF
DataBaseManagementSystems-BTECH--UNIT-5.pdf
PPTX
Hashing
PPTX
hashing in data structure for Btech.pptx
PPTX
hashing in data structure for Btech .pptx
PPTX
hashing in data structure for engineering.pptx
PPTX
Unit 8 searching and hashing
Data Structures-Topic-Hashing, Collision
LECT 10, 11-DSALGO(Hashing).pdf
Lec12-Hash-Tables-27122022-125641pm.pptx
Working with python Nice PPT must try very good
hashing in data structures and its applications
Hashing .pptx
HASHING IS NOT YASH IT IS HASH.pptx
Unit viii searching and hashing
Hashing and Collision Advanced data structure and algorithm
Hashing Techniques in Data Strucures and Algorithm
hashtableeeeeeeeeeeeeeeeeeeeeeeeeeee.pdf
Hashing And Hashing Tables
DataBaseManagementSystems-BTECH--UNIT-5.pdf
Hashing
hashing in data structure for Btech.pptx
hashing in data structure for Btech .pptx
hashing in data structure for engineering.pptx
Unit 8 searching and hashing
Ad

Recently uploaded (20)

PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
web development for engineering and engineering
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
Sustainable Sites - Green Building Construction
PPTX
Construction Project Organization Group 2.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Internet of Things (IOT) - A guide to understanding
DOCX
573137875-Attendance-Management-System-original
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
CH1 Production IntroductoryConcepts.pptx
Lecture Notes Electrical Wiring System Components
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
UNIT 4 Total Quality Management .pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
web development for engineering and engineering
bas. eng. economics group 4 presentation 1.pptx
Model Code of Practice - Construction Work - 21102022 .pdf
Sustainable Sites - Green Building Construction
Construction Project Organization Group 2.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Internet of Things (IOT) - A guide to understanding
573137875-Attendance-Management-System-original
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Structs to JSON How Go Powers REST APIs.pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx

Hashing techniques, Hashing function,Collision detection techniques

  • 2. • How to store Employee records, which consists of Telephone no., Name, City, Department, Salary, etc… Telephone Name City Dept. Salary 2
  • 3. Solution • Use an Array: • Search takes a linear time. • If stored in sorted order this search can be done in O(log n) using binary search but then insertion and deletion becomes costly because we have to maintain the sorted order. 3
  • 4. Solution • Use an Linked List: • Search takes again linear time to search for particular record as se need to go through all the records until the required one is found. • Use balanced BST to store records: • Insertion takes O(log n). • Search takes O(log n). • Deletion takes O(log n). 4
  • 5. Solution • Create a Direct Access Table: • Insertion takes O(1). • Search takes O(1). • Deletion takes O(1). 5
  • 6. • Limitations: • 1] Size of Table: m * 10^n • Where, m = size of pointer to the record n = digits in the telephone number Telephone Memory Location (Pointer) ……………………. 9856234178 0x1… 8695234512 0x2… ……………………. Name City Dept ……. Tejas Pune IT Name City Dept ……. Rahul Mumbai HR • Limitations: • 2] Integer may not hold the size of n digits 6
  • 7. Introduction to Hashing • Provides O(1) time on average for insert, search and delete. • Find items with keys matching a given search key • Given an array A, containing n keys, and a search key x, find the index i such as x=A[i] • Example of Record KEY Other Data 7
  • 8. Continue.. • Hashing is an important Data Structure which is designed to use a special function called the Hash function. • Hash function is used to map a given value with a particular key for faster access of elements. • The efficiency of mapping depends of the efficiency of the hash function used. • Some examples of how hashing is used in our lives include: • In universities, each student is assigned a unique roll number that can be used to retrieve information about them. • In libraries, each book is assigned a unique number that can be used to determine information about the book, such as its exact position in the library or the users it has been issued to etc. 8
  • 9. Continue.. • Assume that you have an object and you want to assign a key to it to make searching easy. • To store the key/value pair, you can use a simple array like a data structure where keys (integers) can be used directly as an index to store values. • However, in cases where the keys are large and cannot be used directly as an index, you should use hashing. 9
  • 10. Continue.. • In hashing, large keys are converted into small keys by using hash functions. • The values are then stored in a data structure called hash table. • The idea of hashing is to distribute entries (key/value pairs) uniformly across an array. • Each element is assigned a key (converted key). • By using that key you can access the element in O(1) time. Using the key, the algorithm (hash function) computes an index that suggests where an entry can be found or inserted. 10
  • 11. Important Keywords: • Hash Table: • A data structure used for storing and retrieving data quickly. Where each entry in hash table is made using hash function. • Hash Function: • It maps a big number or string into a small integer that can be used as index in hash table. • Used to place data in hash table and also to retrieve data from hash table. • A function that can be used to map a data set of an arbitrary size to a data set of a fixed size, which falls into the hash table. • The values returned by a hash function are called hash values, hash codes, hash sums, or simply hashes. • Bucket: • Each position of hash table is called bucket. • Collision: • It is a situation in which hash function returns the same address for more than one record. 0 25 1 Bucket 16 2 42 3 33 4 54 Given: 25, 33, 54, 16, 42 and table of size 5. Hash Function is KEY mod 5 Key mod 5 25 mod 5 = 0 33 mod 5 = 3 54 mod 5 = 4 16 mod 5 = 1 42 mod 5 = 2 30 mod 5 = 0 This is collision. Hash Table 11
  • 12. Important Keywords: • Probe: • Each calculation of an address and test for success is known as probe. • Synonym: • The set of keys that has the same location are called synonyms. • Ex: 25 and 30 are the synonyms. • Overflow: • When hash table becomes full and new record needs to be inserted and resulting into more collisions the it is called overflow. • Perfect hash function: • It is a function that maps distinct key elements into the hash table with no collisions. 0 25 1 16 2 42 3 33 4 54 Entry of 25, 33, 54, 16, 42 in a table of size 5 with probe 5 30 Where 30 mod 5 = 0 is a collision, synonym for 25 and it also shows the overflow situation Hash Table Consider 25, 33, 54, 16, 42, 30. Show Perfect Hash Function 12
  • 13. • To achieve a good hashing mechanism, It is important to have a good hash function with the following basic requirements: • Easy to compute • Uniform distribution • Less Collision 13
  • 15. Types of Hash Functions • Division Method • Multiplication Method • Extraction Method • Mid Square Method • Folding and Universal Method 15
  • 16. Division Method • It depends upon the remainder of division. • Typically the divisor is table length. • Ex: 54, 72, 89, 37 is to be placed in hash table & if table size 10. 0 1 2 3 4 5 6 7 8 9 Hash Function h(key) = record % table_size 54 % 10 = 4 72 % 10 = 2 89 % 10 = 9 37 % 10 = 7 54 72 89 37 Disadvantage: The consecutive keys map to consecutive hash values in the hash table. This leads to a poor performance. 16
  • 18. Extraction Method • Some digits are extracted from the key to form the address location in hash table. • Ex: Suppose 1st, 3rd and 4th digits from left are selected for hash key. 4 9 7 8 2 4 478 At 478 location in the hash table of size 1000 the key can be stored. 18
  • 19. Mid Square Method • Square the key. • Then extract middle part of the result. • That will be the location of key element in hash table. • Ex: A] 3111 = Square(3111) B] 16 = Square(16) 9678321 At 783 location in the hash table of size 1000 the key can be stored. 256 At 56 location in the hash table. 19
  • 20. Folding Method • Two folding techniques: • A] Fold Shift: The key is divided into separate parts based on size of required address. Finally add those parts. • Ex: 345678123 • And required address is of 3 digits. • B] Fold Boundary: The key is divided into separate parts based on size of required address. Then left & right parts are folded and then add those parts. • Ex: 345678123 345678123 345 678 123 ------- 1146 345678123 543 678 321 ------- 1542 20
  • 21. Universal Hashing • In this technique randomized algorithm is used to select the hash function at random from family of hash functions with the help of certain mathematical properties. • Let U is the set of Universal Keys and H be the finite collection of Hash functions mapping U into {0, 1, …….., m-1}. • Say there is collision or your hash function perform badly for input key or a pair (x, y) i.e. { (x,y) | h(x)=h(y)}. U 0 . . m-1 Bad portion of key 21
  • 22. Universal Hashing • The fundamental weakness of hash functions is collision so to avoid that what we do we can choose the Universal hashing where the hash function randomly at the runtime. • Lets consider H = set of all hash functions h from U to {0,1,…,m-1}. { h | h : U  (0,1,…,m-1) } H Bad portion 22
  • 24. 24
  • 25. Properties of Good Hash Functions • Rules for choosing good hash function: 1. The hash function should be simple to compute. 2. Number of collisions should be less ideally no collision should occur (perfect hash function). 3. Hash function should produce such keys which will get distributed uniformly over an array. 4. The hash function should depend on every bit of the key and not on the extracted portion of the key. 25
  • 27. Separate chaining (open hashing) • The open hashing is also known as separate chaining in which collisions are stored outside the table. • Separate chaining is one of the most commonly used collision resolution techniques. • It is usually implemented using linked lists. In separate chaining, each element of the hash table is a linked list. • To store an element in the hash table you must insert it into a specific linked list. • If there is any collision (i.e. two different elements have same hash value) then store both the elements in the same linked list. 27
  • 28. Input: 81, 0, 64, 25, 1, 36, 4, 49, 16, 9. 28
  • 29. Advantages: 1. Simple to implement. 2. Hash table never fills up, we can always add more elements to chain. 3. Less sensitive to the hash function. 4. It is mostly used when it is unknown how many and how frequently keys may be inserted or deleted Disadvantages: 1. Wastage of space. 2. Cache performance of chaining is not good as keys are stored using linked list. 3. If the chain becomes long then search time can become O(n) in worst case. 4. Uses extra space for links. 29
  • 30. Open Addressing • A] Linear Probing with chaining (without replacement): • A separate chain table is maintained for colliding data. • The address of colliding data is stored with first colliding element in chain table. • We can place element at next available bucket or slot and will keep track of that in chain column. Ex: 131, 3, 4, 21, 61, 6, 71, 8, 9. Index Data Chain 0 -1 1 -1 2 -1 3 -1 4 -1 5 -1 6 -1 7 -1 8 -1 9 -1 131 3 4 21 mod 10 = 1 Collision 21 2 61 mod 10 = 1 Collision 61 5 6 71 7 8 9 71 mod 10 = 1 Collision -1 Index Data Chain 0 -1 -1 1 131 2 2 21 5 3 3 -1 4 4 -1 5 61 7 6 6 -1 7 71 -1 8 8 -1 9 9 -1 • Drawback: Finding the next empty location and that results into the element actually belonging to that empty location cannot obtain its location. 30
  • 31. Linear Probing with chaining (without replacement) Ex: Consider elements as 21, 31, 41, 2, 6, 5, 7, 55 with table size of 10. Index Data Chain 0 -1 1 -1 2 -1 3 -1 4 -1 5 -1 6 -1 7 -1 8 -1 9 -1 21 41 2 31 2 5 3 6 -1 Index Data Chain 0 -1 -1 1 21 2 2 31 3 3 41 4 4 2 -1 5 5 8 6 6 -1 7 7 -1 8 55 -1 9 -1 -1 7 55 -1 4 8 31
  • 32. B]Linear Probing with chaining (with replacement) • Ex: 131, 21, 31, 4, 5, 2. Index Data Chain 0 -1 1 -1 2 -1 3 -1 4 -1 5 -1 6 -1 7 -1 8 -1 9 -1 131 31 4 21 mod 10 = 1 Collision 21 2 31 mod 10 = 1 Collision 5 3 2 mod 10 = 2 Index Data Chain 0 -1 1 131 2 2 3 31 -1 4 4 -1 5 5 -1 6 7 -1 8 -1 9 -1 Replace 21 by 2 and accordingly chain table will be updated. 2 21 3 21 3 -1 6 Index Data Chain 0 -1 -1 1 131 6 2 2 -1 3 31 -1 4 4 -1 5 5 -1 6 21 3 7 -1 -1 8 -1 -1 9 -1 -1 32
  • 33. B]Linear Probing with chaining (with replacement) Ex: 131, 3, 4, 21, 61, 6, 71, 16, 85. Index Data Chain 0 -1 1 131 2 2 21 5 3 3 -1 4 4 -1 5 61 7 6 6 -1 7 71 -1 8 -1 9 -1 16 mod 10 = 6 85 mod 10 = 5 Index Data Chain 0 -1 1 131 2 2 21 5 3 3 -1 4 4 -1 5 61 7 6 6 8 7 71 -1 8 16 -1 9 -1 Index Data Chain 0 -1 1 131 2 2 21 5 3 3 -1 4 4 -1 5 85 -1 6 6 8 7 71 -1 8 16 -1 9 61 7 131, 3, 4, 21, 61, 6, 71 33
  • 34. Quadratic Probing • Problem of clustering • Accumulation of Consecutive cells n hash table • When we want to add element we have to go for many probes means need to search for more elements to put element in table. Index Data 0 79 1 41 2 22 3 32 4 44 5 33 6 7 8 18 9 59 • Solution: Quadratic Probing 34
  • 35. Clustering Example • Ex: 2, 22, 12, 42, 32 Index Data Chain 0 -1 1 -1 2 2 3 3 22 4 4 12 5 5 42 6 6 32 -1 7 -1 8 -1 9 -1 Index Data 0 1 2 2 3 4 5 6 7 8 9 [Hash(key)+i^2] mod M (2 + 0^2)mod 10 = 2 (2 + 1^2)mod 10 = 3 22 12 (2 + 2^2)mod 10 = 6 (2 + 3^2)mod 10 = 11  1 42 (2 + 4^2)mod 10 = 18  8 32 35
  • 36. Quadratic Probing • Example: 37, 90, 55, 22, 11, 17, 49, 87 with m = 10. Index Data 0 90 1 11 2 22 3 4 5 55 6 7 37 8 9 17 49 87 36
  • 37. Clustering Example • Ex: 2, 12, 22, 32, 42, 52, 62 Index Data 0 1 32 2 2 3 12 4 5 6 22 7 52 8 42 9 [Hash(key)+i^2] mod M (2 + 6^2)mod 10 = 38  8 (2 + 4^2)mod 10 = 18  8 (2 + 7^2)mod 10 = 51  1 (2 + 8^2)mod 10 = 66  6 (2 + 9^2)mod 10 = 83  3 (2 + 10^2)mod 10 = 102  2 This is known as secondary clustering. Here we are unable to add an element in quadrating probing because the index calculated is never empty. 37
  • 38. Double Hashing • Second hash function is applied to the key when collision occurs. 38
  • 39. Example K H1 H2 18 41 22 44 5 7 - (18 % 7) = 7 - 4 = 3 2 1 9 6 5 5 39
  • 40. Example K H1 H2 18 5 3 41 2 1 22 9 6 44 5 5 Index Data 0 1 2 3 4 5 6 7 8 9 10 11 12 18 For 41 = 2 + 0 * 1 = 2 41 For 22 = 9 + 0 * 6 = 9 22 For 44 = 5 + 0 * 5 = 5 For 44 = 5 + 1 * 5 = 10 42 40

Editor's Notes

  • #27: Collision resolution is the most important issue in hash table implementations.