In Java, why must equals() and hashCode() be consistent?

Sure:

public class Test {
  private final int m, n;

  public Test(int m, int n) {
    this.m = m;
    this.n = n;
  }

  public int hashCode() { return n * m; }

  public boolean equals(Object ob) {
    if (ob.getClass() != Test.class) return false;
    Test other = (Test)ob;
    return m == other.m;
  }
}

with:

Set<Test> set = new HashSet<Test>();
set.put(new Test(3,4));
boolean b = set.contains(new Test(3, 10)); // false

Technically that should be true because m == 3 in both cases.

In general a HashMap works like this: it has a variable number of what are commonly called “buckets”. The number of buckets can change over time (as entries are added and removed) but it is always a power of 2.

Let’s say a given HashMap has 16 buckets. When you call put() to add an entry, the hashCode() of the key is calculated and then a mask is taken depending on the size of the buckets. If you (bitwise) AND the hashCode() with 15 (0x0F) you will get the last 4 bits, equaling a number between 0 and 15 inclusive:

int factor = 4;
int buckets = 1 << (factor-1) - 1; // 16
int mask = buckets - 1; // 15
int code = key.hashCode();
int dest = code & mask; // a number from 0 to 15 inclusive

Now if there is already an entry in that bucket you have what’s called a collision. There are multiple ways of dealing with this but the one used by HashMap (and is probably the most common overall) is bucketing. All the entries with the same masked hashCode are put in a list of some kind.

So to find if a given key is in the map already:

  1. Calculate the masked hash code;
  2. Find the appropriate bucket;
  3. If it’s empty, key not found;
  4. If is isn’t empty, loop through all entries in the bucket checking equals().

Looking through a bucket is a linear (O(n)) operation but it’s on a small subset. The hashcode bucket determination is essentially constant (O(1)). If buckets are sufficiently small then access to a HashMap is usually described as “near O(1)”.

You can make a couple of observations about this.

Firstly, if you have a bunch of objects that all return 42 as their hash code a HashMap will still work but it will operate as an expensive list. Access will be O(n) (as everything will be in the same bucket regardless of the number of buckets). I’ve actually been asked this in an interview.

Secondly, returning to your original point, if two objects are equal (meaning a.equals(b) == b.equals(a) == true) but have different hash codes then the HashMap will go looking in (probably) the wrong bucket resulting in unpredictable and undefined behaviour.

Leave a Comment