Does an empty SQL table have a superkey? Does every SQL table have one?

TL;DR “Superkey” is a RM (Relational Model of Data) term. There’s no standard use in SQL. The superkeys of an SQL table might reasonably informally be said to be the column sets that you could declare primary key or unique not null, plus maybe {} when a table holds at most one row (although you can’t declare it). “Reasonably informally” because SQL tables are not RM relations. But if a table holds no duplicate rows and no nulls then we can reasonably say that it is a relation, and that, like every relation, it has one or more superkeys. The definition of superkey of a base relation or relation expression takes into account all possible values it can hold, so its current value doesn’t affect what its superkeys are. Per the definition of superkey, in an empty relation value every subset of attributes is a superkey.

Relational “superkey”

In mathematics one meaning of “relation” is a table-like set of row-like “tuples” that are lists of values. It represents a relation(ship)/association–also called a “relation” in mathematics. That is where the “R” in “RM” comes from, which is where the term “relational database” comes from. (Codd 1970) (Date 2015) Similarly “ERM” (Entity-Relationship Model) comes from “relationship” as relation/association. (Chen 1976) In a RM context a “relation” is also table-like but usually holds a set of “tuples” that are sets of pairs of “attribute” names & values. (Or it might be a math relation or a mix.) There are two RM senses of “superkey”–of a relation value & of a relation variable or expression. A superkey of a relation value is a set of attributes where a relation does not contain two rows with that subtuple. A superkey of a relation variable or expression is a set of attributes where in every situation/state it does not contain two rows with that subtuple. So a variable has a certain superkey when all the values it can hold have that superkey.

(Find a definition in a published academic textbook. Note that when definitions say “for every” or “for all” values for a name they mean that such a condition is satisfied when there is no such value. Similarly when “for some” & “there exist(s)” refer to named values they don’t mean that the names necessarily name different values.)

An empty value happens to have every subset of attributes as a superkey. The definition of superkey of a variable or expression involving variables takes into account all possible values it can evaluate to, so its current value doesn’t affect what its superkeys are.

Every relation has one or more superkeys: A relation holds a set of tuples, so a tuple value appears at most once, so a value for a subtuple on all attributes appears at most once, so the set of all attributes is a superkey.

SQL vs Relational

An SQL table is not a relation. It is reminiscent of a jumble of math & attribute relations with duplicates and nulls allowed. So SQL databases are called “relational” but they poorly embody the RM.

Because of the resemblance of SQL tables to relations, terms that involve relations get sloppily applied to tables. But although you can borrow terms and give them SQL meanings–value, table, FD (functional dependency), superkey, CK (candidate key), PK (primary key), FK (foreign key), join, and, predicate, NF (normal form), normalize, 1NF, etc–you can’t just substitute those SQL meanings for those words in RM definitions, theorems or algorithms and get something sensible or true. Moreover SQL presentations of RM notions almost never actually tell you how to soundly apply RM notions to an SQL database. They just parrot RM presentations, oblivious to whether their use of SQL meanings for terms makes things nonsensical or invalid. (“Almost” because I hope there are some.)

If you replace “relation” by “table” (duplicates and/or nulls allowed) in certain RM superkey definitions then you get a definition for SQL superkey as a column set that satisfies a primary key or unique not null constraint. For certain other RM superkey definitions you get those sets plus {} when a table holds at most one row. (Since it “identifies” any row.) (You will probably only find people who use a 2nd-style phrasing yet think it defines what a 1st-style phrasing does. And they won’t know they are misusing definitions by misinterpreting terms.) Some might just use the constraint definition. You might find “UK” (unique key) used per any of the three.

When a table holds neither duplicate rows nor nulls we can interpret it as a relation, with rows as tuples & columns as attributes. Then we can reasonably say the table’s superkeys are the relation’s superkeys.

“1NF” has no single meaning. Nor does “normalized” or “unnormalized” or “UNF” or “0NF” or for that matter “relation”.
What to do with null values when modeling and normalizing?

PS: “CK” Don’t confuse superkeys with CKs. A CK is a superkey that contains no smaller superkey. (Hence, we say that a CK is a “minimal” or “irreducible” superkey.) A relation can have multiple superkeys & CKs. A PK is some CK chosen to be distinguished as PK. SQL primary key & unique not null declare what we could call a SQL superkey, but not necessarily a minimal one, what we could call an SQL CK. So when you hear “PK” in an SQL context you have to find out whether it means “(SQL superkey) column list declared via primary key (maybe or maybe not a SQL CK)” and/or “distinguished SQL superkey (maybe or maybe not declared via primary key)” and/or “distinguished SQL CK (maybe or maybe not declared via primary key)”. And you always have to ask what “key” means. Usually, SQL superkey–whatever that means.

PS: “Relation(ship)” Get straight what you mean by each of “relation” & “relationship”–association? table? FK? In a RM database every relation value (of a variable or expression) represents a relation(ship)/association. But “relationship” (sometimes, “relation”) is also (mis)used (in an entrenched way) for FK–not in the RM or ERM, but in pseudo-RM & -ERM methods that misinterpret/misunderstand/misrepresent them, whose roots predate them. (Unfortunately there is very poor RM education in the database industry.) FKs, PKs, CKs, superkeys & other constraints are not needed to query & update. (They are for integrity.)

Leave a Comment