How is set() implemented?

According to this thread:

Indeed, CPython’s sets are implemented as something like dictionaries
with dummy values (the keys being the members of the set), with some
optimization(s) that exploit this lack of values

So basically a set uses a hashtable as its underlying data structure. This explains the O(1) membership checking, since looking up an item in a hashtable is an O(1) operation, on average.

If you are so inclined you can even browse the CPython source code for set which, according to Achim Domma, was originally mostly a cut-and-paste from the dict implementation.

Note: Nowadays, set and dict‘s implementations have diverged significantly, so the precise behaviors (e.g. arbitrary order vs. insertion order) and performance in various use cases differs; they’re still implemented in terms of hashtables, so average case lookup and insertion remains O(1), but set is no longer just “dict, but with dummy/omitted keys”.

Leave a Comment