Which database design gives better performance?

Generally speaking, data integrity is more important than performance, so denormalize1 only if you have performed measurements on representative amounts of data and the results indicate a strong need for better performance. More often that not, a normalized schema will perform just fine, especially if you got your physical design right (such as indexing).

In this particular case, my hunch would be that the second (normalized) design will work just fine.

That being said, probably the most efficient design for a query: “which persons live in a given country” would look something like this:

enter image description here

And then cluster the PERSON on its PK. This way, all persons belonging to the same COUNTRY_ID are stored physically close together in the database, dramatically reducing the I/O for the query above2.

On the other hand, you no longer use simple auto-increment mechanism to generate your CITY_NO and PERSON_NO, the secondary indexes on PERSON are expensive because of clustering leading to other queries becoming slower etc. So, this is not absolutely “better” than your second design, they just both have different strengths/weaknesses and you’ll have to decide which is a better trade-off in your particular case.


1 Which prevents the database from “self-defending” itself from bad data. In your case, the first (denormalized) design would allow a person to reference a country that does not contain the city referenced by the same person.

2 I/O tends to be the biggest cost for most queries.

Leave a Comment