Unicode encoding for filesystem in Mac OS X not correct in Python?

MacOS X uses a special kind of decomposed UTF-8 to store filenames. If you need to e.g. read in filenames and write them to a “normal” UTF-8 file, you must normalize them :

filename = unicodedata.normalize('NFC', unicode(filename, 'utf-8')).encode('utf-8')

from here: https://web.archive.org/web/20120423075412/http://boodebr.org/main/python/all-about-python-and-unicode

Leave a Comment