Open In App

Hash File Organization in DBMS

Last Updated : 30 Jul, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Hashing techniques are used to retrieve specific data. Searching through all index values ​​to reach the desired data becomes very inefficient, in this scenario we may use hashing as an efficient technique for locating desired data directly on disk without using an index structure. Hash File Configuration is also known as Direct file configuration. This permits for fast recovery of information based on a key

In Hashing we mainly refer the following terms:

  • Data Bucket: A data bucket is a storage location where records are stored. These buckets are also considered storage units.
  • Hash Function: A hash function is a mapping function that maps all search keys to actual record addresses. Generally, a hash function uses a primary key to generate a hash index (address of a data block). Hash functions range from simple to complex mathematical functions.
  • Hash Index: The prefix of the entire hash value is used as the hash index. Each hash index has a depth value that indicates the number of bits used to calculate the hash function.

Hashing Technique

Data is stored in data blocks at addresses generated using a hash function. The location where these records are stored is called a data block or data bucket. In this organization, records are stored at known addresses rather than by location. To write a record, the address is first calculated by applying a mathematical function to obtain the record's key. The data record is saved to the generated address. In this case, the records are stored in BUCKETS, which are storage units that can store one or more records. For example, the hash function h(K) = K mod 7 hashes 35 and 43 to addresses 0, 1, as shown below,

43 mod 7 = 135 mod 7 = 0

Hashing Types

There are two types of hashing such as:

  • Static Hashing
  • Dynamic Hashing
  • Open addressing
  • Seperate chaining

Now explanation of each of its type in detail:

1. Static Hashing

If you specify a search key value, the hash function always calculates the same address. If you want to generate an address that uses a mod 5 hash function, only 5 different values ​​will be generated. The output address of this function is always the same. The number of available buckets always remains constant. Bucket addresses generated with static hashing always remain the same.

For example,

If you use the hash function mod(5) to get the address for customer ID =75, you will always get the same bucket address 0

The bucket address does not change in this scenario.

75 mod 5= 0
66 mod 5 = 1
82 mod 5 = 2
93 mod 5 =3
104 mod 5 = 4

and so on.

Static Hashing
Static Hashing mapping with example

2. Dynamic Hashing

In dynamic hashing, Data buckets grow or shrink (dynamically added or removed) as the data set grows or shrinks. Dynamic Hashing is also known as Extended Hashing. Dynamic hashing requires the hash function to generate a large number of values.

For example, there are three data sets: Data1, Data2, and Data3.

The hash function produces three addresses 1010, 1011, and 1001. This storage method only considers part of this address, specifically the first bit that stores the data.

So we try to load three of them into addresses 0 and 1.

h(Data 1) -> 1010
h(Data 2) -> 1011
h(Data 3) -> 1001

Double Hashing Mapping
Double Hashing Mapping Case 1

But the problem is that there are no bucket addresses left for Data3. Buckets must be dynamically expanded to support D3. Therefore, we change the address by 2 bits instead of 1 bit and update the existing data to have a 2-bit address.

Next, try to record data 3.

Double Hashing Mapping
Double Hashing Mapping Case -2

3. Open Addressing

  • All records are stored in the hash table itself (no separate buckets).
  • If collision occurs, another empty slot is found using a probing sequence.

Techniques

  • Linear Probing: Check next slot sequentially.
  • Quadratic Probing: Check slots using a quadratic function.
  • Double Hashing: Use a second hash function to determine the probe step size.

Example:

  • Hash table size = 7
  • Hash function: h(key) = key % 7
  • Collision resolution: Linear Probing

Insert the keys: 50, 700, 76, 85, 92, 73

Step-by-step insertion:

KeyHash (key % 7)Insert AtCollision?Final Position (after probing)
5050 % 7 = 11No1
700700 % 7 = 00No0
7676 % 7 = 66No6
8585 % 7 = 11Yes2 (next slot)
9292 % 7 = 11Yes3 (after 1 and 2 are filled)
7373 % 7 = 33Yes4 (next slot after 3)

Final Hash Table (index → value):

Index0123456
Value70050859273--76

4 . Separate Chaining

In separate chaining, each slot in the hash table holds a linked list of records (or keys) that hash to the same index. This is a common way to handle collisions.

Given:

  • Hash table size = 5
  • Hash function: h(key) = key % 5
  • Keys to insert: 10, 15, 20, 25, 30, 11

Step-by-step hashing:

KeyHash Index (key % 5)Inserted At
100Bucket 0 → [10]
150Bucket 0 → [10 → 15]
200Bucket 0 → [10 → 15 → 20]
250Bucket 0 → [10 → 15 → 20 → 25]
300Bucket 0 → [10 → 15 → 20 → 25 → 30]
111Bucket 1 → [11]

Final Hash Table (with separate chaining):

IndexLinked List (Bucket)
010 → 15 → 20 → 25 → 30
111
2--
3--
4--

Key Points:

  • All keys that hash to the same index (like 10, 15, 20, etc.) are stored in a linked list at that index.
  • Separate chaining avoids clustering and makes insertion easier.
  • Efficient when hash table load factor is high.

Article Tags :

Similar Reads