How can I open files containing accents in Java?

First, the character encoding used is not directly related to the locale. So changing the locale won’t help much.

Second, the � is typical for the Unicode replacement character U+FFFD being printed in ISO-8859-1 instead of UTF-8. Here’s an evidence:

System.out.println(new String("�".getBytes("UTF-8"), "ISO-8859-1")); // �

So there are two problems:

  1. Your JVM is reading those special characters as .
  2. Your console is using ISO-8859-1 to display characters.

For a Sun JVM, the VM argument -Dfile.encoding=UTF-8 should fix the first problem. The second problem is to be fixed in the console settings. If you’re using for example Eclipse, you can change it in Window > Preferences > General > Workspace > Text File Encoding. Set it to UTF-8 as well.


Update: As per your update:

byte[] textArray = f.getName().getBytes();

That should have been the following to exclude influence of platform default encoding:

byte[] textArray = f.getName().getBytes("UTF-8");

If that still displays the same, then the problem lies deeper. What JVM exactly are you using? Do a java -version. As said before, the -Dfile.encoding argument is Sun JVM specific. Some Linux machines ships with GNU JVM or OpenJDK’s JVM and this argument may then not work.

Leave a Comment