Well, I assume it’s because the raw binary data includes the BOM. You could always remove the BOM yourself after decoding, if you don’t want it – but you should consider whether the byte array should consider the BOM to start with.
EDIT: Alternatively, you could use a StreamReader
to perform the decoding. Here’s an example, showing the same byte array being converted into two characters using Encoding.GetString
or one character via a StreamReader
:
using System;
using System.IO;
using System.Text;
class Test
{
static void Main()
{
byte[] withBom = { 0xef, 0xbb, 0xbf, 0x41 };
string viaEncoding = Encoding.UTF8.GetString(withBom);
Console.WriteLine(viaEncoding.Length);
string viaStreamReader;
using (StreamReader reader = new StreamReader
(new MemoryStream(withBom), Encoding.UTF8))
{
viaStreamReader = reader.ReadToEnd();
}
Console.WriteLine(viaStreamReader.Length);
}
}