removing invalid XML characters from a string in java

Java’s regex supports supplementary characters, so you can specify those high ranges with two UTF-16 encoded chars. Here is the pattern for removing characters that are illegal in XML 1.0: // XML 1.0 // #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] String xml10pattern = “[^” + “\u0009\r\n” + “\u0020-\uD7FF” + “\uE000-\uFFFD” … Read more