Is storing a delimited list in a database column really that bad?

In addition to violating First Normal Form because of the repeating group of values stored in a single column, comma-separated lists have a lot of other more practical problems:

  • Can’t ensure that each value is the right data type: no way to prevent 1,2,3,banana,5
  • Can’t use foreign key constraints to link values to a lookup table; no way to enforce referential integrity.
  • Can’t enforce uniqueness: no way to prevent 1,2,3,3,3,5
  • Can’t delete a value from the list without fetching the whole list.
  • Can’t store a list longer than what fits in the string column.
  • Hard to search for all entities with a given value in the list; you have to use an inefficient table-scan. May have to resort to regular expressions, for example in MySQL:
    idlist REGEXP '[[:<:]]2[[:>:]]' or in MySQL 8.0: idlist REGEXP '\\b2\\b'
  • Hard to count elements in the list, or do other aggregate queries.
  • Hard to join the values to the lookup table they reference.
  • Hard to fetch the list in sorted order.
  • Hard to choose a separator that is guaranteed not to appear in the values

To solve these problems, you have to write tons of application code, reinventing functionality that the RDBMS already provides much more efficiently.

Comma-separated lists are wrong enough that I made this the first chapter in my book: SQL Antipatterns: Avoiding the Pitfalls of Database Programming.

There are times when you need to employ denormalization, but as @OMG Ponies mentions, these are exception cases. Any non-relational “optimization” benefits one type of query at the expense of other uses of the data, so be sure you know which of your queries need to be treated so specially that they deserve denormalization.

Leave a Comment