Hashing in DBMS is a technique to quickly locate a data record in a database irrespective of the size of the database. For larger databases containing thousands and millions of records, the indexing data structure technique becomes very inefficient because searching a specific record through indexing will consume more time. This doesn't align with the goals of DBMS, especially when performance and data retrieval time are minimized. So, to counter this problem hashing technique is used. In this article, we will learn about various hashing techniques.
What is Hashing?
The hashing technique utilizes an auxiliary hash table to store the data records using a hash function. There are 2 key components in hashing:
- Hash Table: A hash table is an array or data structure and its size is determined by the total volume of data records present in the database. Each memory location in a hash table is called a 'bucket' or hash indice and stores a data record's exact location and can be accessed through a hash function.
- Bucket: A bucket is a memory location (index) in the hash table that stores the data record. These buckets generally store a disk block which further stores multiple records. It is also known as the hash index.
- Hash Function: A hash function is a mathematical equation or algorithm that takes one data record's primary key as input and computes the hash index as output.
Hash Function
A hash function is a mathematical algorithm that computes the index or the location where the current data record is to be stored in the hash table so that it can be accessed efficiently later. This hash function is the most crucial component that determines the speed of fetching data.
Working of Hash Function
The hash function generates a hash index through the primary key of the data record.
Now, there are 2 possibilities:
1. The hash index generated isn't already occupied by any other value. So, the address of the data record will be stored here.
2. The hash index generated is already occupied by some other value. This is called collision so to counter this, a collision resolution technique will be applied.
3. Now whenever we query a specific record, the hash function will be applied and returns the data record comparatively faster than indexing because we can directly reach the exact location of the data record through the hash function rather than searching through indices one by one.
Example:
HashingTypes of Hashing in DBMS
There are two primary hashing techniques in DBMS.
1. Static Hashing
In static hashing, the hash function always generates the same bucket's address. For example, if we have a data record for employee_id = 107, the hash function is mod-5 which is - H(x) % 5, where x = id. Then the operation will take place like this:
H(106) % 5 = 1.
This indicates that the data record should be placed or searched in the 1st bucket (or 1st hash index) in the hash table.
Example:
Static Hashing TechniqueThe primary key is used as the input to the hash function and the hash function generates the output as the hash index (bucket's address) which contains the address of the actual data record on the disk block.
Static Hashing has the following Properties
- Data Buckets: The number of buckets in memory remains constant. The size of the hash table is decided initially and it may also implement chaining that will allow handling some collision issues though, it's only a slight optimization and may not prove worthy if the database size keeps fluctuating.
- Hash function: It uses the simplest hash function to map the data records to its appropriate bucket. It is generally modulo-hash function
- Efficient for known data size: It's very efficient in terms when we know the data size and its distribution in the database.
- It is inefficient and inaccurate when the data size dynamically varies because we have limited space and the hash function always generates the same value for every specific input. When the data size fluctuates very often it's not at all useful because collision will keep happening and it will result in problems like - bucket skew, insufficient buckets etc.
To resolve this problem of bucket overflow, techniques such as - chaining and open addressing are used. Here's a brief info on both:
1. Chaining
Chaining is a mechanism in which the hash table is implemented using an array of type nodes, where each bucket is of node type and can contain a long chain of linked lists to store the data records. So, even if a hash function generates the same value for any data record it can still be stored in a bucket by adding a new node.
However, this will give rise to the problem bucket skew that is, if the hash function keeps generating the same value again and again then the hashing will become inefficient as the remaining data buckets will stay unoccupied or store minimal data.
2. Open Addressing/Closed Hashing
This is also called closed hashing this aims to solve the problem of collision by looking out for the next empty slot available which can store data. It uses techniques like linear probing, quadratic probing, double hashing, etc.
2. Dynamic Hashing
Dynamic hashing is also known as extendible hashing, used to handle database that frequently changes data sets. This method offers us a way to add and remove data buckets on demand dynamically. This way as the number of data records varies, the buckets will also grow and shrink in size periodically whenever a change is made.
Properties of Dynamic Hashing
- The buckets will vary in size dynamically periodically as changes are made offering more flexibility in making any change.
- Dynamic Hashing aids in improving overall performance by minimizing or completely preventing collisions.
- It has the following major components: Data bucket, Flexible hash function, and directories
- A flexible hash function means that it will generate more dynamic values and will keep changing periodically asserting to the requirements of the database.
- Directories are containers that store the pointer to buckets. If bucket overflow or bucket skew-like problems happen to occur, then bucket splitting is done to maintain efficient retrieval time of data records. Each directory will have a directory id.
- Global Depth: It is defined as the number of bits in each directory id. The more the number of records, the more bits are there.
Working of Dynamic Hashing
Example: If global depth: k = 2, the keys will be mapped accordingly to the hash index. K bits starting from LSB will be taken to map a key to the buckets. That leaves us with the following 4 possibilities: 00, 11, 10, 01.
Dynamic Hashing - mappingAs we can see in the above image, the k bits from LSBs are taken in the hash index to map to their appropriate buckets through directory IDs. The hash indices point to the directories, and the k bits are taken from the directories' IDs and then mapped to the buckets. Each bucket holds the value corresponding to the IDs converted in binary.
Similar Reads
Static Hashing in DBMS Static hashing refers to a hashing technique that allows the user to search over a pre-processed dictionary (all elements present in the dictionary are final and unmodified). In this article, we will take an in-depth look at static hashing in a DBMS. What is Static Hashing?When a search key is speci
5 min read
Dynamic Hashing in DBMS In this article, we will learn about dynamic hashing in DBMS. Hashing in DBMS is used for searching the needed data on the disc. As static hashing is not efficient for large databases, dynamic hashing provides a way to work efficiently with databases that can be scaled. What is Dynamic Hashing in DB
4 min read
Hashing meaning in DSA Hashing is defined as a data distribution technique that transforms given key into a different value using hash function for faster access to data. Characteristics of Hashing:Hashing maps the data object to exactly one memory bucket.It allows uniform distribution of keys across the memory.Uses diffe
2 min read
Hash File Organization in DBMS Hashing techniques are used to retrieve specific data. Searching through all index values ââto reach the desired data becomes very inefficient, in this scenario we may use hashing as an efficient technique for locating desired data directly on disk without using an index structure. Hash File Configu
5 min read
What is Sharding in DBMS? Sharding is a very important concept that helps the system to keep data in different resources according to the sharding process. The word "Shard" means "a small part of a whole". Hence Sharding means dividing a larger part into smaller parts. In DBMS, Sharding is a type of DataBase partitioning in
3 min read
Hashing in JavaScript Hashing is a popular technique used for storing and retrieving data as fast as possible. The main reason behind using hashing is that it performs insertion, deletion, searching, and other operations Why use Hashing?In hashing, all the operations like inserting, searching, and deleting can be perform
6 min read
Applications of Hashing In this article, we will be discussing of applications of hashing.Hashing provides constant time search, insert and delete operations on average. This is why hashing is one of the most used data structure, example problems are, distinct elements, counting frequencies of items, finding duplicates, et
5 min read
Bitmap Indexing in DBMS Bitmap Indexing is a data indexing technique used in database management systems (DBMS) to improve the performance of read-only queries that involve large datasets. It involves creating a bitmap index, which is a data structure that represents the presence or absence of data values in a table or col
8 min read
History of DBMS The first database management systems (DBMS) were created to handle complex data for businesses in the 1960s. These systems included Charles Bachman's Integrated Data Store (IDS) and IBM's Information Management System (IMS). Databases were first organized into tree-like structures using hierarchica
7 min read
Interesting Facts about DBMS The amount of information we are surrounded with is literally exploding every single day and there is an immediate need to organise all these data. Database Management System (DBMS) extract information from millions of facts or data stored in a database. As the need for maintenance increased the dem
3 min read