C# “anyString”.Contains(‘\0’, StringComparison.InvariantCulture) returns true in .NET5 but false in older versions

not a bug, a feature

The issue that I’ve opened has been closed, but they gave a very good explanation. Now… In .NET 5.0 they began using on Windows (on Linux it was already present) a new library for comparing strings, the ICU library. It is the official library of the Unicode Consortium, so it is “the verb”. That library is used for CurrentCulture, InvariantCulture (plus the respective IgnoreCase) and and any other culture. The only exception is the Ordinal/OrdinalIgnoreCase. The library is targetted for text and it has some “particular” ideas about non-text. In this particular case, there are some characters that are simply ignored. In the block 0000-00FF I would say the ignored characters are all control codes (please ignore the fact that they are shown as €‚ƒ„†‡ˆ‰Š‹ŒŽ‘’“”•–—™š›œžŸ, at a certain point these characters have been remapped somewhere else in the Unicode, but the glyps shown don’t reflect it, but if you try to see their code, like doing char ch="€"; int val = (int)ch; you’ll see it), and '\0' is a control code.

Now… My personal thinking is that to compare string from today you’ll need a master’s degree in Unicode Technologies 😥, and I do hope that they’ll do some shenanigans in .NET 6.0 to make the default comparison Ordinal (it is one of the proposals for .NET 6.0, the Option B). Note that if you want to make programs that can run in Turkey you already needed a master’s degree in Unicode Technologies (see the Turkish i problem).

In general I would say that to look for words that aren’t keywords/fixed words (for example column names), you should use Culture-aware comparisons, while to look for keywords/fixed words (for example column names) and symbols/control codes you should use Ordinal comparisons. The problem is when you want to look for both at the same time. Normally in this case you are looking for exact words, so you can use Ordinal. Otherwise it becames hellish. And I don’t even want to think how Regex works internally in a Culture-aware environment. That I don’t want to think about. Becasue in that direction there can only be folly and nightmares 😁.

As a sidenote, even before the “default” Culture-aware comparisons had some secret shaeaningans… for example:

int ix = "ʹ$ʹ".IndexOf("$"); // -1 on .NET Framework or .NET Core <= 3.1

what I had written before

I’ll say that it is a bug. There is a similar bug with IndexOf. I’ve opened an Issue on github to track it.

As you have written, the Ordinal and OrdinalIgnoreCase work as expected (probably because they don’t need to use the new ICU library for handling Unicode).

Some sample code:

Console.WriteLine($"Ordinal Contains null char {"test".Contains("\0", StringComparison.Ordinal)}");
Console.WriteLine($"OrdinalIgnoreCase Contains null char {"test".Contains("\0", StringComparison.OrdinalIgnoreCase)}");

Console.WriteLine($"CurrentCulture Contains null char {"test".Contains("\0", StringComparison.CurrentCulture)}");
Console.WriteLine($"CurrentCultureIgnoreCase Contains null char {"test".Contains("\0", StringComparison.CurrentCultureIgnoreCase)}");

Console.WriteLine($"InvariantCulture Contains null char {"test".Contains("\0", StringComparison.InvariantCulture)}");
Console.WriteLine($"InvariantCultureIgnoreCase Contains null char {"test".Contains("\0", StringComparison.InvariantCultureIgnoreCase)}");

Console.WriteLine($"Ordinal IndexOf null char {"test".IndexOf("\0t", StringComparison.Ordinal)}");
Console.WriteLine($"OrdinalIgnoreCase IndexOf null char {"test".IndexOf("\0", StringComparison.OrdinalIgnoreCase)}");

Console.WriteLine($"CurrentCulture IndexOf null char {"test".IndexOf("\0", StringComparison.CurrentCulture)}");
Console.WriteLine($"CurrentCultureIgnoreCase IndexOf null char {"test".IndexOf("\0", StringComparison.CurrentCultureIgnoreCase)}");

Console.WriteLine($"InvariantCulture IndexOf null char {"test".IndexOf("\0", StringComparison.InvariantCulture)}");
Console.WriteLine($"InvariantCultureIgnoreCase IndexOf null char {"test".IndexOf("\0", StringComparison.InvariantCultureIgnoreCase)}");

and

Console.WriteLine($"Ordinal Contains null char {"test".Contains("\0test", StringComparison.Ordinal)}");
Console.WriteLine($"OrdinalIgnoreCase Contains null char {"test".Contains("\0test", StringComparison.OrdinalIgnoreCase)}");

Console.WriteLine($"CurrentCulture Contains null char {"test".Contains("\0test", StringComparison.CurrentCulture)}");
Console.WriteLine($"CurrentCultureIgnoreCase Contains null char {"test".Contains("\0test", StringComparison.CurrentCultureIgnoreCase)}");

Console.WriteLine($"InvariantCulture Contains null char {"test".Contains("\0test", StringComparison.InvariantCulture)}");
Console.WriteLine($"InvariantCultureIgnoreCase Contains null char {"test".Contains("\0test", StringComparison.InvariantCultureIgnoreCase)}");

Console.WriteLine($"Ordinal IndexOf null char {"test".IndexOf("\0t", StringComparison.Ordinal)}");
Console.WriteLine($"OrdinalIgnoreCase IndexOf null char {"test".IndexOf("\0test", StringComparison.OrdinalIgnoreCase)}");

Console.WriteLine($"CurrentCulture IndexOf null char {"test".IndexOf("\0test", StringComparison.CurrentCulture)}");
Console.WriteLine($"CurrentCultureIgnoreCase IndexOf null char {"test".IndexOf("\0test", StringComparison.CurrentCultureIgnoreCase)}");

Console.WriteLine($"InvariantCulture IndexOf null char {"test".IndexOf("\0test", StringComparison.InvariantCulture)}");
Console.WriteLine($"InvariantCultureIgnoreCase IndexOf null char {"test".IndexOf("\0test", StringComparison.InvariantCultureIgnoreCase)}");

Leave a Comment