Hashing techniques, Hashing function,Collision detection techniques

• How to store Employee records, which consists of Telephone no.,
Name, City, Department, Salary, etc…
Telephone Name City Dept. Salary
2

Solution
• Use an Array:
• Search takes a linear time.
• If stored in sorted order this search can be done in O(log n) using
binary search but then insertion and deletion becomes costly because
we have to maintain the sorted order.
3

Solution
• Use an Linked List:
• Search takes again linear time to search for particular record as se need to go
through all the records until the required one is found.
• Use balanced BST to store records:
• Insertion takes O(log n).
• Search takes O(log n).
• Deletion takes O(log n).
4

Solution
• Create a Direct Access Table:
• Insertion takes O(1).
• Search takes O(1).
• Deletion takes O(1).
5

• Limitations:
• 1] Size of Table: m * 10^n
• Where, m = size of pointer to the record
n = digits in the telephone number
Telephone Memory Location
(Pointer)
…………………….
9856234178 0x1…
8695234512 0x2…
…………………….
Name City Dept …….
Tejas Pune IT
Name City Dept …….
Rahul Mumbai HR
• Limitations:
• 2] Integer may not hold the
size of n digits
6

Introduction to Hashing
• Provides O(1) time on average for insert, search and delete.
• Find items with keys matching a given search key
• Given an array A, containing n keys, and a search key x, find the
index i such as x=A[i]
• Example of Record
KEY Other Data
7

Continue..
• Hashing is an important Data Structure which is designed to use a
special function called the Hash function.
• Hash function is used to map a given value with a particular key for
faster access of elements.
• The efficiency of mapping depends of the efficiency of the hash
function used.
• Some examples of how hashing is used in our lives include:
• In universities, each student is assigned a unique roll number that can be
used to retrieve information about them.
• In libraries, each book is assigned a unique number that can be used to
determine information about the book, such as its exact position in the
library or the users it has been issued to etc.
8

Continue..
• Assume that you have an object and you want to assign a key to it
to make searching easy.
• To store the key/value pair, you can use a simple array like a data
structure where keys (integers) can be used directly as an index to
store values.
• However, in cases where the keys are large and cannot be used
directly as an index, you should use hashing.
9

Continue..
• In hashing, large keys are converted into small keys by using hash
functions.
• The values are then stored in a data structure called hash table.
• The idea of hashing is to distribute entries (key/value pairs)
uniformly across an array.
• Each element is assigned a key (converted key).
• By using that key you can access the element in O(1) time. Using
the key, the algorithm (hash function) computes an index that
suggests where an entry can be found or inserted.
10

Important Keywords:
• Hash Table:
• A data structure used for storing and retrieving data quickly.
Where each entry in hash table is made using hash function.
• Hash Function:
• It maps a big number or string into a small integer that can be
used as index in hash table.
• Used to place data in hash table and also to retrieve data
from hash table.
• A function that can be used to map a data set of an arbitrary
size to a data set of a fixed size, which falls into the hash
table.
• The values returned by a hash function are called hash values,
hash codes, hash sums, or simply hashes.
• Bucket:
• Each position of hash table is called bucket.
• Collision:
• It is a situation in which hash function returns the same
address for more than one record.
0 25
1 Bucket 16
2 42
3 33
4 54
Given: 25, 33, 54, 16, 42 and table
of size 5.
Hash Function is KEY mod 5
Key mod 5
25 mod 5 = 0
33 mod 5 = 3
54 mod 5 = 4
16 mod 5 = 1
42 mod 5 = 2
30 mod 5 = 0
This is collision.
Hash Table
11

Important Keywords:
• Probe:
• Each calculation of an address and test for success
is known as probe.
• Synonym:
• The set of keys that has the same location are
called synonyms.
• Ex: 25 and 30 are the synonyms.
• Overflow:
• When hash table becomes full and new record
needs to be inserted and resulting into more
collisions the it is called overflow.
• Perfect hash function:
• It is a function that maps distinct key elements into
the hash table with no collisions.
0 25
1 16
2 42
3 33
4 54
Entry of 25, 33, 54, 16, 42 in a table
of size 5 with probe 5
30
Where 30 mod 5 = 0
is a collision, synonym for 25 and it
also shows the overflow situation
Hash Table
Consider 25, 33, 54, 16, 42, 30.
Show Perfect Hash Function
12

• To achieve a good hashing mechanism, It is important to
have a good hash function with the following basic
requirements:
• Easy to compute
• Uniform distribution
• Less Collision
13

Types of Hash Functions
• Division Method
• Multiplication Method
• Extraction Method
• Mid Square Method
• Folding and Universal Method
15

Division Method
• It depends upon the remainder of division.
• Typically the divisor is table length.
• Ex: 54, 72, 89, 37 is to be placed in hash table & if table size 10.
0
1
2
3
4
5
6
7
8
9
Hash Function
h(key) = record % table_size
54 % 10 = 4
72 % 10 = 2
89 % 10 = 9
37 % 10 = 7
54
72
89
37
Disadvantage:
The consecutive keys map to
consecutive hash values in the
hash table. This leads to a poor
performance.
16

Extraction Method
• Some digits are extracted from the key to form the address location in
hash table.
• Ex:
Suppose 1st, 3rd and 4th digits from left are selected for hash key.
4 9 7 8 2 4
478
At 478 location in the
hash table of size 1000
the key can be stored.
18

Mid Square Method
• Square the key.
• Then extract middle part of the result.
• That will be the location of key element in hash table.
• Ex:
A] 3111 = Square(3111) B] 16 = Square(16)
9678321
At 783 location in the
hash table of size 1000
the key can be stored.
256
At 56 location in
the hash table.
19

Folding Method
• Two folding techniques:
• A] Fold Shift:
The key is divided into separate
parts based on size of required address.
Finally add those parts.
• Ex: 345678123
• And required address is of 3 digits.
• B] Fold Boundary:
The key is divided into separate
parts based on size of required address.
Then left & right parts are folded and
then add those parts.
• Ex: 345678123
345678123
345
678
123
-------
1146
345678123
543
678
321
-------
1542
20

Universal Hashing
• In this technique randomized algorithm is used to select the hash
function at random from family of hash functions with the help of
certain mathematical properties.
• Let U is the set of Universal Keys and H be the finite collection of Hash
functions mapping U into {0, 1, …….., m-1}.
• Say there is collision or your hash function perform badly for input
key or a pair (x, y) i.e. { (x,y) | h(x)=h(y)}.
U
0
.
.
m-1
Bad portion
of key
21

Universal Hashing
• The fundamental weakness of hash functions is collision so to avoid
that what we do we can choose the Universal hashing where the hash
function randomly at the runtime.
• Lets consider H = set of all hash functions h from U to {0,1,…,m-1}.
{ h | h : U  (0,1,…,m-1) }
H
Bad portion
22

Properties of Good Hash Functions
• Rules for choosing good hash function:
1. The hash function should be simple to compute.
2. Number of collisions should be less ideally no collision should occur (perfect
hash function).
3. Hash function should produce such keys which will get distributed uniformly
over an array.
4. The hash function should depend on every bit of the key and not on the
extracted portion of the key.
25

Collision Resolution Strategies
26

Separate chaining (open hashing)
• The open hashing is also known as separate chaining in which
collisions are stored outside the table.
• Separate chaining is one of the most commonly used collision
resolution techniques.
• It is usually implemented using linked lists. In separate chaining, each
element of the hash table is a linked list.
• To store an element in the hash table you must insert it into a specific
linked list.
• If there is any collision (i.e. two different elements have same hash
value) then store both the elements in the same linked list.
27

Input:
81, 0, 64, 25, 1, 36, 4, 49, 16, 9.
28

Advantages:
1. Simple to implement.
2. Hash table never fills up, we can always add more elements to chain.
3. Less sensitive to the hash function.
4. It is mostly used when it is unknown how many and how frequently
keys may be inserted or deleted
Disadvantages:
1. Wastage of space.
2. Cache performance of chaining is not good as keys are stored using
linked list.
3. If the chain becomes long then search time can become O(n) in worst
case.
4. Uses extra space for links.
29

Open Addressing
• A] Linear Probing with chaining (without
replacement):
• A separate chain table is maintained for
colliding data.
• The address of colliding data is stored with
first colliding element in chain table.
• We can place element at next available
bucket or slot and will keep track of that in
chain column.
Ex: 131, 3, 4, 21, 61, 6, 71, 8, 9.
Index Data Chain
0 -1
1 -1
2 -1
3 -1
4 -1
5 -1
6 -1
7 -1
8 -1
9 -1
131
3
4
21 mod 10 = 1 Collision
21
2
61
5
6
71
7
8
9
-1
Index Data Chain
0 -1 -1
1 131 2
2 21 5
3 3 -1
4 4 -1
5 61 7
6 6 -1
7 71 -1
8 8 -1
9 9 -1
• Drawback: Finding the next empty location and that results into the element actually belonging to
that empty location cannot obtain its location.
30

Linear Probing with chaining (without replacement)
Ex: Consider elements as 21, 31, 41, 2, 6, 5, 7, 55 with
table size of 10.
Index Data Chain
0 -1
1 -1
2 -1
3 -1
4 -1
5 -1
6 -1
7 -1
8 -1
9 -1
21
41
2
31
2
5
3
6
-1
Index Data Chain
0 -1 -1
1 21 2
2 31 3
3 41 4
4 2 -1
5 5 8
6 6 -1
7 7 -1
8 55 -1
9 -1 -1
7
55
-1
4
8
31

B]Linear Probing with chaining (with replacement)
• Ex: 131, 21, 31, 4, 5, 2.
Index Data Chain
0 -1
1 -1
2 -1
3 -1
4 -1
5 -1
6 -1
7 -1
8 -1
9 -1
131
31
4
21
2
5
3
2 mod 10 = 2
Index Data Chain
0 -1
1 131 2
2
3 31 -1
4 4 -1
5 5 -1
6
7 -1
8 -1
9 -1
Replace 21 by 2 and
accordingly chain table will
be updated.
2 21 3
21 3
-1
6
Index Data Chain
0 -1 -1
1 131 6
2 2 -1
3 31 -1
4 4 -1
5 5 -1
6 21 3
7 -1 -1
8 -1 -1
9 -1 -1
32

B]Linear Probing with chaining (with replacement)
Ex: 131, 3, 4, 21, 61, 6, 71, 16, 85.
Index Data Chain
0 -1
1 131 2
2 21 5
3 3 -1
4 4 -1
5 61 7
6 6 -1
7 71 -1
8 -1
9 -1
16 mod 10 = 6 85 mod 10 = 5
Index Data Chain
0 -1
1 131 2
2 21 5
3 3 -1
4 4 -1
5 61 7
6 6 8
7 71 -1
8 16 -1
9 -1
Index Data Chain
0 -1
1 131 2
2 21 5
3 3 -1
4 4 -1
5 85 -1
6 6 8
7 71 -1
8 16 -1
9 61 7
131, 3, 4, 21, 61, 6, 71
33

Quadratic Probing
• Problem of clustering
• Accumulation of Consecutive cells n hash table
• When we want to add element we have to go for
many probes means need to search for more
elements to put element in table.
Index Data
0 79
1 41
2 22
3 32
4 44
5 33
6
7
8 18
9 59
• Solution: Quadratic Probing
34

Clustering Example
• Ex: 2, 22, 12, 42, 32
Index Data Chain
0 -1
1 -1
2 2 3
3 22 4
4 12 5
5 42 6
6 32 -1
7 -1
8 -1
9 -1
Index Data
0
1
2 2
3
4
5
6
7
8
9
[Hash(key)+i^2] mod M
(2 + 0^2)mod 10 = 2
(2 + 1^2)mod 10 = 3
22
12
(2 + 2^2)mod 10 = 6
(2 + 3^2)mod 10 = 11  1
42
(2 + 4^2)mod 10 = 18  8
32
35

Quadratic Probing
• Example: 37, 90, 55, 22, 11, 17, 49, 87 with m = 10. Index Data
0 90
1 11
2 22
3
4
5 55
6
7 37
8
9
17
49
87
36

Clustering Example
• Ex: 2, 12, 22, 32, 42, 52, 62
Index Data
0
1 32
2 2
3 12
4
5
6 22
7 52
8 42
9
[Hash(key)+i^2] mod M
(2 + 6^2)mod 10 = 38  8
(2 + 4^2)mod 10 = 18  8
(2 + 7^2)mod 10 = 51  1
(2 + 8^2)mod 10 = 66  6
(2 + 9^2)mod 10 = 83  3
(2 + 10^2)mod 10 = 102  2
This is known as secondary
clustering.
Here we are unable to add
an element in quadrating
probing because the index
calculated is never empty.
37

Double Hashing
• Second hash function is applied to the key when collision occurs.
38

Example
K H1 H2
18
41
22
44
5 7 - (18 % 7) = 7 - 4 = 3
2 1
9 6
5 5
39

Example
K H1 H2
18 5 3
41 2 1
22 9 6
44 5 5
Index Data
0
1
2
3
4
5
6
7
8
9
10
11
12
18
For 41 = 2 + 0 * 1 = 2
41
For 22 = 9 + 0 * 6 = 9 22
For 44 = 5 + 0 * 5 = 5
For 44 = 5 + 1 * 5 = 10
42
40

Hashing techniques, Hashing function,Collision detection techniques

More Related Content

What's hot (20)

Similar to Hashing techniques, Hashing function,Collision detection techniques (20)

Recently uploaded (20)

Hashing techniques, Hashing function,Collision detection techniques

Editor's Notes