What is XML BOM and how do I detect it?

For a ANSI XML file it should actually be removed. If you want to use UTF-8 you don’t really need it. Only for UTF-16 and UTF-32 it is needed.

The Byte-Order-Mark (or BOM), is a
special marker added at the very
beginning of an Unicode file encoded
in UTF-8, UTF-16 or UTF-32. It is used
to indicate whether the file uses the
big-endian or little-endian byte
order. The BOM is mandatory for UTF-16
and UTF-32, but it is optional for
UTF-8.

(Source: https://www.opentag.com/xfaq_enc.htm#enc_bom)

Regarding the question on how detect this in java.

Check the following answer to this question: Java : How to determine the correct charset encoding of a stream

Basically just read in the first few bytes yourself and then determine if you may have found a BOM.

Leave a Comment