How to compare Unicode characters that “look alike”?

Because it is really different symbols even they look the same, first is the actual letter and has char code = 956 (0x3BC) and the second is the micro sign and has 181 (0xB5).

References:

So if you want to compare them and you need them to be equal, you need to handle it manually, or replace one char with another before comparison. Or use the following code:

public void Main()
{
    var s1 = "μ";
    var s2 = "µ";

    Console.WriteLine(s1.Equals(s2));  // false
    Console.WriteLine(RemoveDiacritics(s1).Equals(RemoveDiacritics(s2))); // true 
}

static string RemoveDiacritics(string text) 
{
    var normalizedString = text.Normalize(NormalizationForm.FormKC);
    var stringBuilder = new StringBuilder();

    foreach (var c in normalizedString)
    {
        var unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(c);
        if (unicodeCategory != UnicodeCategory.NonSpacingMark)
        {
            stringBuilder.Append(c);
        }
    }

    return stringBuilder.ToString().Normalize(NormalizationForm.FormC);
}

And the Demo

Leave a Comment