SQL primary key: integer vs varchar

VARCHAR vs. INT doesn’t tell much. What matter is the access pattern.

On absolute terms, a wider key will always be worse than a narrow key. The type carries absolutely no importance, is the width that matters. When compared with INT though, few types can beat INT in narrowness, so INT usually wins that argument just by the fact that is only 4 bytes wide.

But what really matters is the choice of clustered key. Often confused with the primary key, the two represent different notions and are not required to overlap. Here is a more detailed discussion Should I design a table with a primary key of varchar or int? The choice of the clustered key is just about the most important decision in table design, and a mechanical application of an INT identity(1,1) on it may be just the biggest mistake one can make. Here is where the question of access patterns comes in:

  • what are the most frequent interrogations on the table?
    • what columns are projected?
    • what predicates are applied?
    • what ranges are searched?
    • what joins are performed?
    • what aggregations occur?
  • how is the data inserted into the table?
  • how is the data updated in the table?
  • how is old data purged from the table, if ever?
  • how many non-clustered indexes exist?
    • how often are columns included in the NC indexes (key or leaf) are updated?

Overall, there are many access patterns that can be ruined by using an INT IDENTITY clustered key. So before jumping to apply a cookie cutter solution, perhaps a little bit of analysis is required…

Some more general guidelines:

You see there are no Primary Key design guidelines, because the Primary key is not an issue of storage design but an issue of modeling and is entirely domain driven.

Leave a Comment