transformer.setOutputProperty(OutputKeys.ENCODING, “UTF-8”) is NOT working

I had the same problem on Android when serializing emoji characters. When using UTF-8 encoding in the transformer the output was HTML character entities (UTF-16 surrogate pairs), which would subsequently break other parsers that read the data.

This is how I ended up solving it:

StringWriter sw = new StringWriter();
sw.write("<?xml version=\"1.0\" encoding=\"UTF-8\" ?>");
Transformer t = TransformerFactory.newInstance().newTransformer();

// this will work because we are creating a Java string, not writing to an output
t.setOutputProperty(OutputKeys.ENCODING, "UTF-16"); 
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
t.transform(new DOMSource(elementNode), new StreamResult(sw));

return IOUtils.toInputStream(sw.toString(), Charset.forName("UTF-8"));

Leave a Comment