How to store data with dynamic number of attributes in a database

If you ever plan on searching for specific attributes, it’s a bad idea to serialize them into a single column, since you’ll have to use per-row functions to get the information out – this rarely scales well.

I would opt for your second choice. Have a list of attributes in an attribute table, the objects in their own table, and a many-to-many relationship table called object attributes.

For example:

objects:
    object_id    integer
    object_name  varchar(20)
    primary key  (object_id)
attributes:
    attr_id      integer
    attr_name    varchar(20)
    primary key  (attr_id)
object_attributes:
    object_id    integer  references (objects.object_id)
    attr_id      integer  references (attributes.attr_id)
    oa_value     varchar(20)
    primary key (object_id,attr_id)

Your concern about performance is noted but, in my experience, it’s always more costly to split a column than to combine multiple columns. If it turns out that there are performance problems, it’s perfectly acceptable to break 3NF for performance reasons.

In that case I would store it the same way but also have a column with the raw serialized data. Provided you use insert/update triggers to keep the columnar and combined data in sync, you won’t have any problems. But you shouldn’t worry about that until an actual problem surfaces.

By using those triggers, you minimize the work required to only when the data changes. By trying to extract sub-column information, you do unnecessary work on every select.

Leave a Comment