ENOTES

Normalization using 1NF 2NF and 3NF

access_time Aug 02, 2021 remove_red_eye 25456

The process of arranging data in several simplified database tables in order to reduce data redundancy i.e. duplication or repetition of data and to maintain data consistency and integrity. This process was introduced by Edgar Frank "Ted" Codd (E.F. Codd). He introduce three normal forms popular known as 1NF, 2NF and 3NF.  The relations between tables in Relational database (RDBMS) is considered as  "normalized" if it meets 3NF (Third Normal Form). 3NF is free from all insertion, deletion and updation anomalies. Repetition of same data again and again consume memory space and processing time which degrades the overall performance of the system. Similarly, due to inconsistency in data the required data may not be available at the time we want and also may cause loss of information. Hence, to reduce such anomalies of database table we use several normal forms to make sure that every database table attributes remains consistent and un-redundant.

Anomalies in RDBMS

In RDBMS several tables are inter-related with each other through table relationship. In such case changing on set of data may directly affect the other set of data. This condition is known as anomalies in DBMS or RDBMS. There are 3 different anomalies that we may encounter while designing a database. They are:

  1. Insertion anomalies: It occurs when data cant be inserted due to absence of other data.
  2. Deletion anomalies: It occurs when we delete one data sets but another data sets gets deleted along with its. Unwanted deletion of data.
  3. Updation anomalies: It occurs when we update a data, but it only gets partially updated through out the several related tables.

Advantages of Normalization

  1. It eliminates data redundancy i.e. duplication of data.
  2. It enhances faster sorting and index creation.
  3. It simplifies the complex structure of tables.
  4. It avoid loss of data through consistency and integrity.
  5. It solves insertion, deletion and updating anomalies.
  6. It improves the performance of a system.

The most popular widely used normal forms are:

  1. First Normal Form (1NF)
  2. Second Normal Form (2NF)
  3. Third Normal Form (3NF)
  4. Boyce Codd Normal Form (BCNF)
  5. Fourth Normal Form (4NF)

Let us consider the un-normalized data attributes.

Teacher_NameSubjectAgeAddress
Ram

Physics

Maths                                                

27Butwal
SitaEnglish25Palpa
HariNepali28Pokhara

1NF (First Normal Form)

Once, we have un-normalized data sets, we arrange then in first normal form by making their each attribute atomic which means in 1NF there should not be repeated data group.

Teacher_NameSubjectAgeAddresss
RamPhysics27Butwal
RamMaths27Butwal
SitaEnglish25Palpa
HariNepali28Pokhara

The above table is in first normal form 1NF.

2NF (Second Normal Form)

After arranging given data sets in 1NF, we eliminate functionally dependent attributes i.e. in this case, Subject depends upon the teacher not on the age and address, similarly age and address also depends upon teacher not on subject. So, in 2NF we remove such functional dependencies by introducing primary key and foreign key.

In order to identify records uniquely we assigned a key value which is known as Primary key. Here, we will give unique ID for each teacher.

Since, Age and Address depend upon Teacher not on Subject we further break table of 1NF as follows: 

Teacher_IDTeacher_NameAgeAddress
1Ram27Butwal
2Sita25Palpa
3Hari28Pokhara

Since, Subject depend upon Teacher not on Age and Address, we also already have assigned a primary key for the teacher_name in above table so same, key can be used to link teacher_name with subject.

Teacher_IDSubject
1Physics
1Maths
2English
3Nepali

The key which was primary key is now considered as foreign key in this table. In general Foreign key are the same key as the primary key of one table used by several other tables to establish relationship.

3NF (Third Normal Form)

As we normalize till 2NF, now to achieve 3NF we have to overcome some more issues associated with 2NF. Let us say, in above table if one teacher left the job, then in this table as functional dependencies are eliminate it will not effect other data.

Teacher_IDTeacher_NameAgeAddress
1Ram27Butwal
2Sita25Palpa
3Hari28Pokhara

But, in the table below if teacher is changed its will affect subject, change in one effect other this is known as transitive functional dependencies.

Teacher_IDSubject
1Physics
1Maths
2English
3Nepali

So, to prevent loss of information due to transitive functional dependencies, we assign separate subject code for each subject, so that even change in the teacher will not affect the subjects,

Subject_IDSubject
Phy1Physics
Mat1Maths
Eng1English
Nep1Nepali

Like this way, 3NF can be achieved after eliminating transitive functional dependencies. Hence, it reduces all three insertion, updation and deletion anomalies.

Boyce Codd Normal Forms (BCNF) 

Edgar Frank "Ted" Codd (E.F. Codd) and Raymond F Boyce introduce Boyce Codd Normal Forms (BCNF) in 1974 AD. It is the slightly stronger version of 3NF to deal with the anomalies which was not addressed by 3NF.

4NF (Fourth Normal Form)

It was introduced by Ronald Fagin in 1977 AD. We deal with functional dependencies in all above normal forms i.e. 2NF, 3NF and even in BCNF whereas 4NF deals with more general types of dependencies called multivalued dependencies.