First, the character encoding used is not directly related to the locale. So changing the locale won’t help much.
Second, the �
is typical for the Unicode replacement character U+FFFD �
being printed in ISO-8859-1 instead of UTF-8. Here’s an evidence:
System.out.println(new String("�".getBytes("UTF-8"), "ISO-8859-1")); // �
So there are two problems:
- Your JVM is reading those special characters as
�
. - Your console is using ISO-8859-1 to display characters.
For a Sun JVM, the VM argument -Dfile.encoding=UTF-8
should fix the first problem. The second problem is to be fixed in the console settings. If you’re using for example Eclipse, you can change it in Window > Preferences > General > Workspace > Text File Encoding. Set it to UTF-8 as well.
Update: As per your update:
byte[] textArray = f.getName().getBytes();
That should have been the following to exclude influence of platform default encoding:
byte[] textArray = f.getName().getBytes("UTF-8");
If that still displays the same, then the problem lies deeper. What JVM exactly are you using? Do a java -version
. As said before, the -Dfile.encoding
argument is Sun JVM specific. Some Linux machines ships with GNU JVM or OpenJDK’s JVM and this argument may then not work.