What’s the role of GetHashCode in the IEqualityComparer in .NET?

A bit of background first…

Every object in .NET has an Equals method and a GetHashCode method.

The Equals method is used to compare one object with another object – to see if the two objects are equivalent.

The GetHashCode method generates a 32-bit integer representation of the object. Since there is no limit to how much information an object can contain, certain hash codes are shared by multiple objects – so the hash code is not necessarily unique.

A dictionary is a really cool data structure that trades a higher memory footprint in return for (more or less) constant costs for Add/Remove/Get operations. It is a poor choice for iterating over though. Internally, a dictionary contains an array of buckets, where values can be stored. When you add a Key and Value to a dictionary, the GetHashCode method is called on the Key. The hashcode returned is used to determine the index of the bucket in which the Key/Value pair should be stored.

When you want to access the Value, you pass in the Key again. The GetHashCode method is called on the Key, and the bucket containing the Value is located.

When an IEqualityComparer is passed into the constructor of a dictionary, the IEqualityComparer.Equals and IEqualityComparer.GetHashCode methods are used instead of the methods on the Key objects.

Now to explain why both methods are necessary, consider this example:

BoxEqualityComparer boxEqC = new BoxEqualityComparer(); 

Dictionary<Box, String> boxes = new Dictionary<Box, string>(boxEqC); 

Box redBox = new Box(100, 100, 25);
Box blueBox = new Box(1000, 1000, 25);

boxes.Add(redBox, "red"); 
boxes.Add(blueBox, "blue"); 

Using the BoxEqualityComparer.GetHashCode method in your example, both of these boxes have the same hashcode – 100^100^25 = 1000^1000^25 = 25 – even though they are clearly not the same object. The reason that they are the same hashcode in this case is because you are using the ^ (bitwise exclusive-OR) operator so 100^100 cancels out leaving zero, as does 1000^1000. When two different objects have the same key, we call that a collision.

When we add two Key/Value pairs with the same hashcode to a dictionary, they are both stored in the same bucket. So when we want to retrieve a Value, the GetHashCode method is called on our Key to locate the bucket. Since there is more than one value in the bucket, the dictionary iterates over all of the Key/Value pairs in the bucket calling the Equals method on the Keys to find the correct one.

In the example that you posted, the two boxes are equivalent, so the Equals method returns true. In this case the dictionary has two identical Keys, so it throws an exception.

TLDR

So in summary, the GetHashCode method is used to generate an address where the object is stored. So a dictionary doesn’t have to search for it. It just computes the hashcode and jumps to that location. The Equals method is a better test of equality, but cannot be used to map an object into an address space.

Leave a Comment