What is Normalization?
So what's all this hoopla about Normalization? You make a table with a Public Key then hook it to your application and presto, you have all the records. Well Normalization isn't about making 1 gigantic table, nor is it about making a ton of partially related tables. Normalization is the process of designing a database to reduce redundancy. That doesn't mean that it won't get complicated, in fact it probably will, what it means that it will not create unnecessary duplication of data. While it will probably add duplicate data, the total redundancy of all that data will not be as redundant, which will result in a smaller and more efficient database.
The task of normalizing a database is step through the process of reducing redundant data in steps called Normal Form. Each step is a continual process of further reducing the redundancy through these Normal Forms.
The first of these steps is called First Normal Form (1NF) and as the steps progress subsequent levels of normalization are aptly called Second Normal Form (2NF), Third Normal Form (3NF), Fourth Normal Form (4NF), and lastly Fifth Normal Form (5NF). You as well as most Database Developers will never have to anguish through the 4NF nor the 5NF steps of Normalization. My favorite and feel the easiest to comprehend and to work with is 2NF. Below are summations of each level of Normalization.
0th normal form is the classic spreadsheet layout for a database table. Characteristic of a spreadsheet layout are:
- Repeating groups of fields
- Positional dependence of data
- Non-Atomic data
0NF - was the database design of a former employer, they invented this technique by just making one great big table; no keys no nothing. They also thought it would be convinient.
A relation is in 1NF if it contains no multi-valued attributes. Thus a table that contains multi-valued attributes or repeating groups is not a relation. So one could say that 1NF sets the very basic rules for an organized database:
- Eliminate duplicative columns from the same table.
- Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
A relation is said to be in 2NF if it is in 1NF and every nonkey attribute is fully functionally dependant on the primary key. So 2NF further addresses the concept of removing duplicative data:
- Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
- Create relationships between these new tables and their predecessors through the use of foreign keys.
A relation is 3NF if it is in 2NF and no transitive dependencies exist. A transitive dependency in a relation is a functional dependency between two or more nonkey attributes. So (3NF) goes one large step further:
- Remove columns that are not dependent upon the primary key.
4NF has one requirement:
- A relation is in 4NF if it has no multi-valued dependencies.
And 5NF deals with the problem when relations can no longer be decomposed. In some rare cases, a relation can have problems like redundant information and update anomalies because of it but cannot be decomposed in two relations to remove the problems. In such cases it may be possible to decompose the relation in three or more relations using the 5NF.
The fifth normal form deals with join-dependencies which is a generalization of the MVD. The aim of fifth normal form is to have relations that cannot be decomposed further. A relation in 5NF cannot be constructed from several smaller relations.
algorithms | dbms | html | j2ee | mis | networking | os | se | more...