The process of arranging data in several simplified database tables in order to reduce data redundancy i.e. duplication or repetition of data and to maintain data consistency and integrity. This process was introduced by Edgar Frank "Ted" Codd (E.F. Codd). He introduce three normal forms popular known as 1NF, 2NF and 3NF. The relations between tables in Relational database (RDBMS) is considered as "normalized" if it meets 3NF (Third Normal Form). 3NF is free from all insertion, deletion and updation anomalies. Repetition of same data again and again consume memory space and processing time which degrades the overall performance of the system. Similarly, due to inconsistency in data the required data may not be available at the time we want and also may cause loss of information. Hence, to reduce such anomalies of database table we use several normal forms to make sure that every database table attributes remains consistent and un-redundant.
In RDBMS several tables are inter-related with each other through table relationship. In such case changing on set of data may directly affect the other set of data. This condition is known as anomalies in DBMS or RDBMS. There are 3 different anomalies that we may encounter while designing a database. They are:
The most popular widely used normal forms are:
Let us consider the un-normalized data attributes.
Teacher_Name | Subject | Age | Address |
Ram | Physics Maths | 27 | Butwal |
Sita | English | 25 | Palpa |
Hari | Nepali | 28 | Pokhara |
Once, we have un-normalized data sets, we arrange then in first normal form by making their each attribute atomic which means in 1NF there should not be repeated data group.
Teacher_Name | Subject | Age | Addresss |
Ram | Physics | 27 | Butwal |
Ram | Maths | 27 | Butwal |
Sita | English | 25 | Palpa |
Hari | Nepali | 28 | Pokhara |
The above table is in first normal form 1NF.
After arranging given data sets in 1NF, we eliminate functionally dependent attributes i.e. in this case, Subject depends upon the teacher not on the age and address, similarly age and address also depends upon teacher not on subject. So, in 2NF we remove such functional dependencies by introducing primary key and foreign key.
In order to identify records uniquely we assigned a key value which is known as Primary key. Here, we will give unique ID for each teacher.
Since, Age and Address depend upon Teacher not on Subject we further break table of 1NF as follows:
Teacher_ID | Teacher_Name | Age | Address |
1 | Ram | 27 | Butwal |
2 | Sita | 25 | Palpa |
3 | Hari | 28 | Pokhara |
Since, Subject depend upon Teacher not on Age and Address, we also already have assigned a primary key for the teacher_name in above table so same, key can be used to link teacher_name with subject.
Teacher_ID | Subject |
1 | Physics |
1 | Maths |
2 | English |
3 | Nepali |
The key which was primary key is now considered as foreign key in this table. In general Foreign key are the same key as the primary key of one table used by several other tables to establish relationship.
As we normalize till 2NF, now to achieve 3NF we have to overcome some more issues associated with 2NF. Let us say, in above table if one teacher left the job, then in this table as functional dependencies are eliminate it will not effect other data.
Teacher_ID | Teacher_Name | Age | Address |
1 | Ram | 27 | Butwal |
2 | Sita | 25 | Palpa |
3 | Hari | 28 | Pokhara |
But, in the table below if teacher is changed its will affect subject, change in one effect other this is known as transitive functional dependencies.
Teacher_ID | Subject |
1 | Physics |
1 | Maths |
2 | English |
3 | Nepali |
So, to prevent loss of information due to transitive functional dependencies, we assign separate subject code for each subject, so that even change in the teacher will not affect the subjects,
Subject_ID | Subject |
Phy1 | Physics |
Mat1 | Maths |
Eng1 | English |
Nep1 | Nepali |
Like this way, 3NF can be achieved after eliminating transitive functional dependencies. Hence, it reduces all three insertion, updation and deletion anomalies.
Edgar Frank "Ted" Codd (E.F. Codd) and Raymond F Boyce introduce Boyce Codd Normal Forms (BCNF) in 1974 AD. It is the slightly stronger version of 3NF to deal with the anomalies which was not addressed by 3NF.
It was introduced by Ronald Fagin in 1977 AD. We deal with functional dependencies in all above normal forms i.e. 2NF, 3NF and even in BCNF whereas 4NF deals with more general types of dependencies called multivalued dependencies.