As documented:
The
Length
property returns the number ofChar
objects in this instance, not the number of Unicode characters. The reason is that a Unicode character might be represented by more than oneChar
. Use theSystem.Globalization.StringInfo
class to work with each Unicode character instead of each Char.
Getting length:
new System.Globalization.StringInfo("友𠂇又").LengthInTextElements
Getting each Unicode character is documented here, but it’s much more convenient to make an extension method:
public static IEnumerable<string> TextElements(this string s) {
var en = System.Globalization.StringInfo.GetTextElementEnumerator(s);
while (en.MoveNext())
{
yield return en.GetTextElement();
}
}
and use it in a foreach
or in a LINQ statement:
foreach (string segment in "友𠂇又".TextElements())
{
Console.WriteLine(segment);
}
which also can be used for length:
Console.WriteLine("友𠂇又".TextElements().Count());