GetHashCode Guidelines in C#

It’s been a long time, but nevertheless I think it is still necessary to give a correct answer to this question, including explanations about the whys and hows. The best answer so far is the one citing the MSDN exhaustivly – don’t try to make your own rules, the MS guys knew what they were doing.

But first things first:
The Guideline as cited in the question is wrong.

Now the whys – there are two of them

First why:
If the hashcode is computed in a way, that it does not change during the lifetime of an object, even if the object itself changes, than it would break the equals-contract.

Remember:
“If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.”

The second sentence often is misinterpreted as “The only rule is, that at object creation time, the hashcode of equal objects must be equal”. Don’t really know why, but that’s about the essence of most answers here as well.

Think of two objects containing a name, where the name is used in the equals method: Same name -> same thing.
Create Instance A: Name = Joe
Create Instance B: Name = Peter

Hashcode A and Hashcode B will most likely not be the same.
What would now happen, when the Name of instance B is changed to Joe?

According to the guideline from the question, the hashcode of B would not change. The result of this would be:
A.Equals(B) ==> true
But at the same time:
A.GetHashCode() == B.GetHashCode() ==> false.

But exactly this behaviour is forbidden explicitly by the equals&hashcode-contract.

Second why:
While it is – of course – true, that changes in the hashcode could break hashed lists and other objects using the hashcode, the reverse is true as well. Not changing the hashcode will in the worst case get hashed lists, where all of a lot of different objects will have the same hashcode and therefor be in the same hash bin – happens when objects are initialized with a standard value, for example.


Now coming to the hows
Well, on first glance, there seems to be a contradiction – either way, code will break.
But neither problem does come from changed or unchanged hashcode.

The source of the problems is well described in the MSDN:

From MSDN’s hashtable entry:

Key objects must be immutable as long
as they are used as keys in the
Hashtable.

This does mean:

Any object that creates a hashvalue should change the hashvalue, when the object changes, but it must not – absolutely must not – allow any changes to itself, when it is used inside a Hashtable (or any other Hash-using object, of course).

First how
Easiest way would of course be to design immutable objects only for the use in hashtables, that will be created as copys of the normal, the mutable objects when needed.
Inside the immutable objects, it’s obviusly ok to cache the hashcode, since it’s immutable.

Second how
Or give the object a “you are hashed now”-flag, make sure all object data is private, check the flag in all functions that can change objects data and throw an exception data if change is not allowed (i.e. flag is set).
Now, when you put the object in any hashed area, make sure to set the flag, and – as well – unset the flag, when it is no longer needed.
For ease of use, I’d advise to set the flag automatically inside the “GetHashCode” method – this way it can’t be forgotten. And the explicit call of a “ResetHashFlag” method will make sure, that the programmer will have to think, wether it is or is not allowed to change the objects data by now.

Ok, what should be said as well: There are cases, where it is possible to have objects with mutable data, where the hashcode is nevertheless unchanged, when the objects data is changed, without violating the equals&hashcode-contract.

This does however require, that the equals-method is not based on the mutable data as well.
So, if I write an object, and create a GetHashCode method that does calculate a value only once and stores it inside the object to return it on later calls, then I must, again: absolutely must, create a Equals method, that will use stored values for the comparison, so that A.Equals(B) will never change from false to true as well. Otherwise, the contract would be broken. The result of this will usually be that the Equals method doesn’t make any sense – it’s not the original reference equals, but it is neither a value equals as well. Sometimes, this may be intended behaviour (i.e. customer records), but usually it is not.

So, just make GetHashCode result change, when the object data changes, and if the use of the object inside of hash using lists or objects is intended (or just possible) then make the object either immutable or create a readonly flag to use for the lifetime of a hashed list containing the object.

(By the way: All of this is not C# oder .NET specific – it is in the nature of all hashtable implementations, or more generally of any indexed list, that identifying data of objects should never change, while the object is in the list. Unexpected and unpredictable behaviour will occur, if this rule is broken. Somewhere, there may be list implementations, that do monitor all elements inside the list and do automatic reindexing the list – but the performance of those will surely be gruesome at best.)

Leave a Comment