How Can I Best Guess the Encoding when the BOM (Byte Order Mark) is Missing?

Maybe you can shell out to a Python script that uses Chardet: Universal Encoding Detector. It is a reimplementation of the character encoding detection that used by Firefox, and is used by many different applications. Useful links: Mozilla’s code, research paper it was based on (ironically, my Firefox fails to correctly detect the encoding of … Read more

XDocument: saving XML to file without BOM

Use an XmlTextWriter and pass that to the XDocument’s Save() method, that way you can have more control over the type of encoding used: var doc = new XDocument( new XDeclaration(“1.0”, “utf-8”, null), new XElement(“root”, new XAttribute(“note”, “boogers”)) ); using (var writer = new XmlTextWriter(“.\\boogers.xml”, new UTF8Encoding(false))) { doc.Save(writer); } The UTF8Encoding class constructor has … Read more

How to GetBytes() in C# with UTF8 encoding with BOM?

Try like this: public ActionResult Download() { var data = Encoding.UTF8.GetBytes(“some data”); var result = Encoding.UTF8.GetPreamble().Concat(data).ToArray(); return File(result, “application/csv”, “foo.csv”); } The reason is that the UTF8Encoding constructor that takes a boolean parameter doesn’t do what you would expect: byte[] bytes = new UTF8Encoding(true).GetBytes(“a”); The resulting array would contain a single byte with the value … Read more

Write text files without Byte Order Mark (BOM)?

In order to omit the byte order mark (BOM), your stream must use an instance of UTF8Encoding other than System.Text.Encoding.UTF8 (which is configured to generate a BOM). There are two easy ways to do this: 1. Explicitly specifying a suitable encoding: Call the UTF8Encoding constructor with False for the encoderShouldEmitUTF8Identifier parameter. Pass the UTF8Encoding instance … Read more