Check if a String is valid UTF-8 encoded in Java

Only byte data can be checked. If you constructed a String then its already in UTF-16 internally.

Also only byte arrays can be UTF-8 encoded.

Here is a common case of UTF-8 conversions.

String myString = "\u0048\u0065\u006C\u006C\u006F World";
System.out.println(myString);
byte[] myBytes = null;

try 
{
    myBytes = myString.getBytes("UTF-8");
} 
catch (UnsupportedEncodingException e)
{
    e.printStackTrace();
    System.exit(-1);
}

for (int i=0; i < myBytes.length; i++) {
    System.out.println(myBytes[i]);
}

If you don’t know the encoding of your byte array, juniversalchardet is a library to help you detect it.

Leave a Comment