Serializing an object as UTF-8 XML in .NET

No, you can use a StringWriter to get rid of the intermediate MemoryStream. However, to force it into XML you need to use a StringWriter which overrides the Encoding property:

public class Utf8StringWriter : StringWriter
{
    public override Encoding Encoding => Encoding.UTF8;
}

Or if you’re not using C# 6 yet:

public class Utf8StringWriter : StringWriter
{
    public override Encoding Encoding { get { return Encoding.UTF8; } }
}

Then:

var serializer = new XmlSerializer(typeof(SomeSerializableObject));
string utf8;
using (StringWriter writer = new Utf8StringWriter())
{
    serializer.Serialize(writer, entry);
    utf8 = writer.ToString();
}

Obviously you can make Utf8StringWriter into a more general class which accepts any encoding in its constructor – but in my experience UTF-8 is by far the most commonly required “custom” encoding for a StringWriter 🙂

Now as Jon Hanna says, this will still be UTF-16 internally, but presumably you’re going to pass it to something else at some point, to convert it into binary data… at that point you can use the above string, convert it into UTF-8 bytes, and all will be well – because the XML declaration will specify “utf-8” as the encoding.

EDIT: A short but complete example to show this working:

using System;
using System.Text;
using System.IO;
using System.Xml.Serialization;

public class Test
{    
    public int X { get; set; }

    static void Main()
    {
        Test t = new Test();
        var serializer = new XmlSerializer(typeof(Test));
        string utf8;
        using (StringWriter writer = new Utf8StringWriter())
        {
            serializer.Serialize(writer, t);
            utf8 = writer.ToString();
        }
        Console.WriteLine(utf8);
    }


    public class Utf8StringWriter : StringWriter
    {
        public override Encoding Encoding => Encoding.UTF8;
    }
}

Result:

<?xml version="1.0" encoding="utf-8"?>
<Test xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <X>0</X>
</Test>

Note the declared encoding of “utf-8” which is what we wanted, I believe.

Leave a Comment