Java. Ignore accents when comparing strings

I think you should be using the Collator class. It allows you to set a strength and locale and it will compare characters appropriately.

From the Java 1.6 API:

You can set a Collator’s strength
property to determine the level of
difference considered significant in
comparisons. Four strengths are
provided: PRIMARY, SECONDARY,
TERTIARY, and IDENTICAL. The exact
assignment of strengths to language
features is locale dependant. For
example, in Czech, “e” and “f” are
considered primary differences, while
“e” and “ě” are secondary differences,
“e” and “E” are tertiary differences
and “e” and “e” are identical.

I think the important point here (which people are trying to make) is that “Joao”and “João” should never be considered as equal, but if you are doing sorting you don’t want them to be compared based on their ASCII value because then you would have something like Joao, John, João, which is not good. Using the collator class definitely handles this correctly.

Leave a Comment