Just urlencode
the string desired as a filename. All characters returned from urlencode
are valid in filenames (NTFS/HFS/UNIX), then you can just urldecode
the filenames back to UTF-8 (or whatever encoding they were in).
Caveats (all apply to the solutions below as well):
- After url-encoding, the filename must be less that 255 characters (probably bytes).
- UTF-8 has multiple representations for many characters (using combining characters). If you don’t normalize your UTF-8, you may have trouble searching with
glob
or reopening an individual file. - You can’t rely on
scandir
or similar functions for alpha-sorting. You musturldecode
the filenames then use a sorting algorithm aware of UTF-8 (and collations).
Worse Solutions
The following are less attractive solutions, more complicated and with more caveats.
On Windows, the PHP filesystem wrapper expects and returns ISO-8859-1 strings for file/directory names. This gives you two choices:
-
Use UTF-8 freely in your filenames, but understand that non-ASCII characters will appear incorrect outside PHP. A non-ASCII UTF-8 char will be stored as multiple single ISO-8859-1 characters. E.g.
ó
will be appear asó
in Windows Explorer. -
Limit your file/directory names to characters representable in ISO-8859-1. In practice, you’ll pass your UTF-8 strings through
utf8_decode
before using them in filesystem functions, and pass the entriesscandir
gives you throughutf8_encode
to get the original filenames in UTF-8.
Caveats galore!
- If any byte passed to a filesystem function matches an invalid Windows filesystem character in ISO-8859-1, you’re out of luck.
- Windows may use an encoding other than ISO-8859-1 in non-English locales. I’d guess it will usually be one of ISO-8859-#, but this means you’ll need to use
mb_convert_encoding
instead ofutf8_decode
.
This nightmare is why you should probably just transliterate to create filenames.