How do I remove diacritics (accents) from a string in .NET?

I’ve not used this method, but Michael Kaplan describes a method for doing so in his blog post (with a confusing title) that talks about stripping diacritics: Stripping is an interesting job (aka
On the meaning of meaningless, aka All
Mn characters are non-spacing, but
some are more non-spacing than
others)

static string RemoveDiacritics(string text) 
{
    var normalizedString = text.Normalize(NormalizationForm.FormD);
    var stringBuilder = new StringBuilder(capacity: normalizedString.Length);

    for (int i = 0; i < normalizedString.Length; i++)
    {
        char c = normalizedString[i];
        var unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(c);
        if (unicodeCategory != UnicodeCategory.NonSpacingMark)
        {
            stringBuilder.Append(c);
        }
    }

    return stringBuilder
        .ToString()
        .Normalize(NormalizationForm.FormC);
}

Note that this is a followup to his earlier post: Stripping diacritics….

The approach uses String.Normalize to split the input string into constituent glyphs (basically separating the “base” characters from the diacritics) and then scans the result and retains only the base characters. It’s just a little complicated, but really you’re looking at a complicated problem.

Of course, if you’re limiting yourself to French, you could probably get away with the simple table-based approach in How to remove accents and tilde in a C++ std::string, as recommended by @David Dibben.

Leave a Comment