
Normalization
is a set of rules that are to be applied while designing the database tables
which are to be connected with each other by relationships. Database normalization is a technical term to improve the data accuracy and efficiency and reduces the data
redundancy and inconsistent data dependency.
We
organize the data into database tables by using normal forms rules or
conditions. Normal forms help us to make a good database design. Generally in
the real practice, we organize the data up to third normal form. We rarely use
the fourth and fifth normal form. These are the Normal forms of the
normalization –
4)
Boyce
Code Normal Form (BCNF)
5)
Fourth
Normal Form (4NF)
6)
Fifth
Normal Form (5NF)
For the examples, we’ll use table Reservation Staff Information as shown below UNF as a starting point. As we pointed out in the last post’s modification anomalies section, there are several issues to keeping the information in this form. By normalizing the data you see we’ll eliminate duplicate data as well as modification anomalies.
Reservation Staff
|
||||||
Project Code
|
Project Name
|
Project Manager
|
Project Budget
|
Employee1
|
Employee2
|
Employee3
|
PC001
|
Reservation
|
Mike Towery
|
$125,000
|
Ryan Arjun
|
Will Smith
|
Bill Gray
|
PC002
|
Attendance
|
Chris Gray
|
$155,000
|
Tony Towery
|
Chao Milk
|
Red Bill
|
PC003
|
Human Resources
|
Kimmy Wang
|
$225,000
|
Lucy Gray
|
Chris Crow
|
Tom Ramsay
|
First Normal Form (1NF)
A database table is said to be in 1NF if it contains no repeating fields/columns. Once a table is in first normal form it is easier to search, filter, and sort the information. The process of converting the UNF table into 1NF is as follows:
1) Separate
the repeating fields into new database tables along with the key from
unnormalized database table.
2)
The
primary key of new database tables may be a composite key.
When
a value is atomic, the values cannot be further subdivided. For example, the value “Reservation” is
atomic; whereas “Reservation; Attendance; Human Resources” is not. Related to
this requirement is the concept that a table should not contain repeating
groups of columns such as Employee1, Employee2 and Employee3.
This
example table is transformed to first normal form by placing the repeating
customer related columns into their own table.
This is shown below:
The
repeating groups of columns now become separate rows in the Reservation Staff
table linked by the Emp ID foreign key.
As we know that a foreign key is a value which matches back to another
table’s primary key.
In
our case, the Reservation Staff table contains the corresponding Emp Id for the
Employee Master now.
Employee Master
|
|||
Emp Id
|
Employee Name
|
EmployeeDept
|
EmployeeRate
|
E0001
|
Ryan Arjun
|
Database
|
240
|
E0002
|
Tony Towery
|
Testing
|
230
|
E0003
|
Lucy Gray
|
IT
|
250
|
E0004
|
Will Smith
|
Database
|
245
|
E0005
|
Chao Milk
|
Testing
|
225
|
E0006
|
Chris Gyal
|
IT
|
210
|
E0007
|
Bill Gray
|
Database
|
190
|
E0008
|
Red Bill
|
Testing
|
210
|
E0009
|
Tom Ramsay
|
IT
|
200
|
Primary Key for the above table is Emp Id
|
Reservation Staff
|
||||
Project Code
|
Emp Id
|
Project Name
|
Project Manager
|
Project Budget
|
PC001
|
E0001
|
Reservation
|
Mike Towery
|
$125,000
|
PC002
|
E0002
|
Attendance
|
Chris Gray
|
$155,000
|
PC003
|
E0003
|
Human Resources
|
Kimmy Wang
|
$225,000
|
PC001
|
E0004
|
Reservation
|
Mike Towery
|
$125,000
|
PC002
|
E0005
|
Attendance
|
Chris Gray
|
$155,000
|
PC003
|
E0006
|
Human Resources
|
Kimmy Wang
|
$225,000
|
PC001
|
E0007
|
Reservation
|
Mike Towery
|
$125,000
|
PC002
|
E0008
|
Attendance
|
Chris Gray
|
$155,000
|
PC003
|
E0009
|
Human Resources
|
Kimmy Wang
|
$225,000
|
Composite Key ( Unique Key) = Project Code + Emp Id
|
This
design is superior to our original table in several ways:
1. The original design limited each Reservation Staff entry to three employees. In the new design, the number of employees associated to each design is practically unlimited.
2. It was nearly impossible to Sort the original data by Reservation Staff. You could, if you used the UNION statement, but it would be cumbersome. Now, it is simple to sort customers.
3. The same holds true for filtering on the Reservation Staff table.
Modification anomalies remain in both tables, but these are fixed once we reorganize them as 2nd normal form.
1. The original design limited each Reservation Staff entry to three employees. In the new design, the number of employees associated to each design is practically unlimited.
2. It was nearly impossible to Sort the original data by Reservation Staff. You could, if you used the UNION statement, but it would be cumbersome. Now, it is simple to sort customers.
3. The same holds true for filtering on the Reservation Staff table.
Modification anomalies remain in both tables, but these are fixed once we reorganize them as 2nd normal form.