Understanding logic in CaseInsensitiveComparator

From Unicode Technical Standard:

In addition, because of the vagaries of natural language, there are situations where two different Unicode characters have the same uppercase or lowercase

So, it’s not enough to compare only uppercase of two characters, because they may have different uppercase and same lowercase

Simple brute force check gives some results. Check for example code points 73 and 304:

char ch1 = (char) 73; //LATIN CAPITAL LETTER I
char ch2 = (char) 304; //LATIN CAPITAL LETTER I WITH DOT ABOVE
System.out.println(ch1==ch2);
System.out.println(Character.toUpperCase(ch1)==Character.toUpperCase(ch2));
System.out.println(Character.toLowerCase(ch1)==Character.toLowerCase(ch2));

Output:

false
false
true

So “İ” and “I” are not equal to each other. Both characters are uppercase. But they share the same lower case letter: “i” and that gives a reason to treat them as same values in case insensitive comparison.

Leave a Comment