Is there a standard way to do an fopen with a Unicode string file path?

No, there’s no standard way. There are some differences between operating systems. Here’s how different OSs handle non-ASCII filenames.

Linux

Under Linux, a filename is simply a binary string. The convention on most modern distributions is to use UTF-8 for non-ASCII filenames. But in the beginning, it was common to encode filenames as ISO-8859-1. It’s basically up to each application to choose an encoding, so you can even have different encodings used on the same filesystem. The LANG environment variable can give you a hint what the preferred encoding is. But these days, you can probably assume UTF-8 everywhere.

This is not without problems, though, because a filename containing an invalid UTF-8 sequence is perfectly valid on most Linux filesystems. How would you specify such a filename if you only support UTF-8? Ideally, you should support both UTF-8 and binary filenames.

OS X

The HFS filesystem on OS X uses Unicode (UTF-16) filenames internally. Most C (and POSIX) library functions like fopen accept UTF-8 strings (since they’re 8-bit compatible) and convert them internally.

Windows

The Windows API uses UTF-16 for filenames, but fopen uses the current codepage, whatever that is (UTF-8 just became an option). Many C library functions have a non-standard equivalent that accepts UTF-16 (wchar_t on Windows). For example, _wfopen instead of fopen.

Leave a Comment