How to open file in PHP that has unicode characters in its name?

UPDATE (July 13 ’17)

Although the docs do not seem to mention it, PHP 7.0 and above finally supports Unicode filenames on Windows out of the box. PHP’s Filesystem APIs accept and return filenames according to default_charset, which is UTF-8 by default.

Refer to bug fix here: https://github.com/php/php-src/commit/3d3f11ede4cc7c83d64cc5edaae7c29ce9c6986f


UPDATE (Jan 29 ’15)

If you have access to the PHP extensions directory, you can try installing php-wfio.dll at https://github.com/kenjiuno/php-wfio, and refer to files via the wfio:// protocol.

file_get_contents("wfio://你好.xml");

Original Answer

PHP on Windows uses the Legacy “ANSI APIs” exclusively for local file access, which means PHP uses the System Locale instead of Unicode.

To access files whose filenames contain Unicode, you must convert the filename to the specified encoding for the current System Locale. If the filename contains characters that are not representable in the specified encoding, you’re out of luck (Update: See section above for a solution). scandir will return gibberish for these files and passing the string back in fopen and equivalents will fail.

To find the right encoding to use, you can get the system locale by calling <?=setlocale(LC_TYPE,0)?>, and looking up the Code Page Identifier (the number after the .) at the MSDN Article https://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx.

For example, if the function returns Chinese (Traditional)_HKG.950, this means that the 950 codepage is in use and the filename should be converted to the big-5 encoding. In that case, your code will have to be as follows, if your file is saved in UTF-8 (preferrably without BOM):

$fname = iconv('UTF-8','big-5',"你好.xml");
file_get_contents($fname);

or as follows if you directly save the file as Big-5:

$fname = "你好.xml";
file_get_contents($fname);

Leave a Comment