In what way does denormalization improve database performance?

Denormalization is generally used to either:

Avoid a certain number of queries
Remove some joins

The basic idea of denormalization is that you’ll add redundant data, or group some, to be able to get those data more easily — at a smaller cost; which is better for performances.

A quick examples?

Consider a “Posts” and a “Comments” table, for a blog
- For each Post, you’ll have several lines in the “Comment” table
- This means that to display a list of posts with the associated number of comments, you’ll have to:
  - Do one query to list the posts
  - Do one query per post to count how many comments it has (Yes, those can be merged into only one, to get the number for all posts at once)
  - Which means several queries.
Now, if you add a “number of comments” field into the Posts table:
- You only need one query to list the posts
- And no need to query the Comments table: the number of comments are already de-normalized to the Posts table.
- And only one query that returns one more field is better than more queries.

Now, there are some costs, yes:

First, this costs some place on both disk and in memory, as you have some redundant informations:
- The number of comments are stored in the Posts table
- And you can also find those number counting on the Comments table
Second, each time someone adds/removes a comment, you have to:
- Save/delete the comment, of course
- But also, update the corresponding number in the Posts table.
- But, if your blog has a lot more people reading than writing comments, this is probably not so bad.

More Related Contents:

Leave a Comment Cancel reply