Hash File Organization in DBMS

Last Updated : 30 Jul, 2025

Hashing techniques are used to retrieve specific data. Searching through all index values to reach the desired data becomes very inefficient, in this scenario we may use hashing as an efficient technique for locating desired data directly on disk without using an index structure. Hash File Configuration is also known as Direct file configuration. This permits for fast recovery of information based on a key

In Hashing we mainly refer the following terms:

Data Bucket: A data bucket is a storage location where records are stored. These buckets are also considered storage units.
Hash Function: A hash function is a mapping function that maps all search keys to actual record addresses. Generally, a hash function uses a primary key to generate a hash index (address of a data block). Hash functions range from simple to complex mathematical functions.
Hash Index: The prefix of the entire hash value is used as the hash index. Each hash index has a depth value that indicates the number of bits used to calculate the hash function.

Hashing Technique

Data is stored in data blocks at addresses generated using a hash function. The location where these records are stored is called a data block or data bucket. In this organization, records are stored at known addresses rather than by location. To write a record, the address is first calculated by applying a mathematical function to obtain the record's key. The data record is saved to the generated address. In this case, the records are stored in BUCKETS, which are storage units that can store one or more records. For example, the hash function h(K) = K mod 7 hashes 35 and 43 to addresses 0, 1, as shown below,

43 mod 7 = 135 mod 7 = 0

Hashing Types

There are two types of hashing such as:

Static Hashing
Dynamic Hashing
Open addressing
Seperate chaining

Now explanation of each of its type in detail:

1. Static Hashing

If you specify a search key value, the hash function always calculates the same address. If you want to generate an address that uses a mod 5 hash function, only 5 different values will be generated. The output address of this function is always the same. The number of available buckets always remains constant. Bucket addresses generated with static hashing always remain the same.

For example,

If you use the hash function mod(5) to get the address for customer ID =75, you will always get the same bucket address 0

The bucket address does not change in this scenario.

75 mod 5= 0
66 mod 5 = 1
82 mod 5 = 2
93 mod 5 =3
104 mod 5 = 4

and so on.

2. Dynamic Hashing

In dynamic hashing, Data buckets grow or shrink (dynamically added or removed) as the data set grows or shrinks. Dynamic Hashing is also known as Extended Hashing. Dynamic hashing requires the hash function to generate a large number of values.

For example, there are three data sets: Data1, Data2, and Data3.

The hash function produces three addresses 1010, 1011, and 1001. This storage method only considers part of this address, specifically the first bit that stores the data.

So we try to load three of them into addresses 0 and 1.

h(Data 1) -> 1010
h(Data 2) -> 1011
h(Data 3) -> 1001

But the problem is that there are no bucket addresses left for Data3. Buckets must be dynamically expanded to support D3. Therefore, we change the address by 2 bits instead of 1 bit and update the existing data to have a 2-bit address.

Next, try to record data 3.

3. Open Addressing

All records are stored in the hash table itself (no separate buckets).
If collision occurs, another empty slot is found using a probing sequence.

Techniques

Linear Probing: Check next slot sequentially.
Quadratic Probing: Check slots using a quadratic function.
Double Hashing: Use a second hash function to determine the probe step size.

Example:

Hash table size = 7
Hash function: h(key) = key % 7
Collision resolution: Linear Probing

Insert the keys: 50, 700, 76, 85, 92, 73

Step-by-step insertion:

Key	Hash (key % 7)	Insert At	Collision?	Final Position (after probing)
50	50 % 7 = 1	1	No	1
700	700 % 7 = 0	0	No	0
76	76 % 7 = 6	6	No	6
85	85 % 7 = 1	1	Yes	2 (next slot)
92	92 % 7 = 1	1	Yes	3 (after 1 and 2 are filled)
73	73 % 7 = 3	3	Yes	4 (next slot after 3)

Final Hash Table (index → value):

Index	0	1	2	3	4	5	6
Value	700	50	85	92	73	--	76

4 . Separate Chaining

In separate chaining, each slot in the hash table holds a linked list of records (or keys) that hash to the same index. This is a common way to handle collisions.

Given:

Hash table size = 5
Hash function: h(key) = key % 5
Keys to insert: 10, 15, 20, 25, 30, 11

Step-by-step hashing:

Key	Hash Index (key % 5)	Inserted At
10	0	Bucket 0 → [10]
15	0	Bucket 0 → [10 → 15]
20	0	Bucket 0 → [10 → 15 → 20]
25	0	Bucket 0 → [10 → 15 → 20 → 25]
30	0	Bucket 0 → [10 → 15 → 20 → 25 → 30]
11	1	Bucket 1 → [11]

Final Hash Table (with separate chaining):

Index	Linked List (Bucket)
0	10 → 15 → 20 → 25 → 30
1	11
2	--
3	--
4	--

Key Points:

All keys that hash to the same index (like 10, 15, 20, etc.) are stored in a linked list at that index.
Separate chaining avoids clustering and makes insertion easier.
Efficient when hash table load factor is high.

Introduction of DBMS (Database Management System)

jitdutpief

Improve

Article Tags :

DBMS

Hash File Organization in DBMS

Hashing Technique

Hashing Types

1. Static Hashing

2. Dynamic Hashing

3. Open Addressing

Techniques

Example:

Insert the keys: 50, 700, 76, 85, 92, 73

Step-by-step insertion:

Final Hash Table (index → value):

4 . Separate Chaining

Given:

Step-by-step hashing:

Final Hash Table (with separate chaining):

Key Points:

Similar Reads

Basic of DBMS

Entity Relationship Model

Relational Model

Relational Algebra

Functional Dependencies & Normalization

Transactions & Concurrency Control

Advanced DBMS

DBMS Practice

Thank You!

What kind of Experience do you want to share?