setting a UTF-8 in java and csv file [duplicate]

I spent some time but found solution for your problem.

First I opened notepad and wrote the following line: שלום, hello, привет
Then I saved it as file he-en-ru.csv using UTF-8.
Then I opened it with MS excel and everything worked well.

Now, I wrote a simple java program that prints this line to file as following:

    PrintWriter w = new PrintWriter(new OutputStreamWriter(os, "UTF-8"));
    w.print(line);
    w.flush();
    w.close();

When I opened this file using excel I saw “gibrish.”

Then I tried to read content of 2 files and (as expected) saw that file generated by notepad contains 3 bytes prefix:

    239 EF
    187 BB
    191 BF

So, I modified my code to print this prefix first and the text after that:

    String line = "שלום, hello, привет";
    OutputStream os = new FileOutputStream("c:/temp/j.csv");
    os.write(239);
    os.write(187);
    os.write(191);

    PrintWriter w = new PrintWriter(new OutputStreamWriter(os, "UTF-8"));

    w.print(line);
    w.flush();
    w.close();

And it worked! I opened the file using excel and saw text as I expected.

Bottom line: write these 3 bytes before writing the content. This prefix indicates that the content is in ‘UTF-8 with BOM‘ (otherwise it is just ‘UTF-8 without BOM’).

Leave a Comment