What is the best database schema to support values that are only appropriate to specific rows?

Ok, this is the ER model of what you currently have (omitting cardinalities):

Now, let’s focus on the Calendar and SubCalendar. Clearly, you have a hierarchy there. But how are hierarchies turned into tables? There are three common ways to do this:

1) Kill the parent and keep the children: In this case you remove the parent entity and send all the fields from that entity to each of the children. In your example you only have one child, so all the parent’s attributes will go only to it.

Advantages: No null values as each table will have all it needs. No joins are required either. If you will be running queries searching just for one type of children this schema will be useful because you won’t need to filter by type because each table will store only one type

Disadvantages: This schema is not appropriate for cases where you have overlapping children. In other words, if a parent row can have more than one children when sending the fields to each child the parent data will be duplicated in each children. Not good, so don’t use this strategy if that is the case. Additionally, if you have many children and very few records in each, you’ll have many tables with few records each, so it might become a little harder to manage

2) Kill children and keep the parent: In this case you remove all the children and send all of their attributes to the parent. As the parent is now a mix of itself and all of its children it needs a way to determine what row belongs to what type of children. This is accomplished by adding a new attribute to the parent entity that will determine the type of each row (no matter the data type).

Advantages: There will only be one table for all children, so that is easy to manage. No joins are required. Might be useful if the most queries that are run against this table require results from more than one type of children.

Disadvantages: Again, if a parent can have a row that relates to more than one children data will be duplicated as there will be one row per each of them, so there is a limitation in this solution. Additionally, a new column will have to be added as metadata. The volume of records in the table will be the larger. Null values will have to be assigned to the data that children have and the parent nor other children have.

3) Keep all: The least bloody solution is not to kill anything 🙂 In this case the hierarchy is replaced by a relationship between the parent and each of the children. That way, a child will have to join to the parent table by means of a foreign key to reach the parent’s data.

Advantages: There is no data duplication nor null values. Each entity has just the minimal amount of data and the rest can be obtained by joining to the parent table. In this case, a parent row can be linked to more than one children without duplicating data. If there will be run many queries that can be satisfied with only one table (usually the parent) this is a good option. One more thing is that it is easy to extend to more calendars, for example, if a new calendar is to be added that requires new fields, then a new table has to be added, without modifying the current ones

Disadvantages: Requires the most tables (actually one more than the first one). A join is needed per each children which will degrade performance the larger the dataset becomes. Additionally, foreign keys will be needed to join both tables. If the most queries will require data from parent and children this schema will be the worst in terms of performance

Now, you asked which is the best database schema. I think it is now clear it depends on the requirements, types of queries that will be run, the way data is structured, etc.

However, I can analyze this a little bit more. You said that you had a Calendar table and sometimes more data is needed for one of them. So we can say we have 2 types of calendars, the parent and the child. So we might think that going for solution 2 is a good possibility because you’ll have 2 rows representing each type, but we would be wrong. This is because each child includes its parent in this case. Now, if we can assume that if SubAttribute will always be non-null for a child and null for a parent we can even remove CalendarType, which will actually lead to solution 1.

Finally, as a rule of thumb (and mainly because most queries have lots of joins in real life), if you want to focus on performance, you should go for solution 1, otherwise, if you want to focus on having normalized design you should go for solution 3.

I hope this has cleared some doubts and possibly generated others 🙂

Leave a Comment