How to write a std::string to a UTF-8 text file

The only way UTF-8 affects std::string is that size(), length(), and all the indices are measured in bytes, not characters.

And, as sbi points out, incrementing the iterator provided by std::string will step forward by byte, not by character, so it can actually point into the middle of a multibyte UTF-8 codepoint. There’s no UTF-8-aware iterator provided in the standard library, but there are a few available on the ‘Net.

If you remember that, you can put UTF-8 into std::string, write it to a file, etc. all in the usual way (by which I mean the way you’d use a std::string without UTF-8 inside).

You may want to start your file with a byte order mark so that other programs will know it is UTF-8.

Leave a Comment