UnicodeDecodeError when performing os.walk

Right I just spent some time sorting through this error, and wordier answers here aren’t getting at the underlying issue:

The problem is, if you pass a unicode string into os.walk(), then os.walk starts getting unicode back from os.listdir() and tries to keep it as ASCII (hence ‘ascii’ decode error). When it hits a unicode only special character which str() can’t translate, it throws the exception.

The solution is to force the starting path you pass to os.walk to be a regular string – i.e. os.walk(str(somepath)). This means os.listdir returns regular byte-like strings and everything works the way it should.

You can reproduce this problem (and show it’s solution works) trivially like:

  1. Go into bash in some directory and run touch $(echo -e "\x8b\x8bThis is a bad filename") which will make some test files.

  2. Now run the following Python code (iPython Qt is handy for this) in the same directory:

    l = []
    for root,dir,filenames in os.walk(unicode('.')):
        l.extend([ os.path.join(root, f) for f in filenames ])
    print l
    

And you’ll get a UnicodeDecodeError.

  1. Now try running:

    l = []
    for root,dir,filenames in os.walk('.'):
        l.extend([ os.path.join(root, f) for f in filenames ])
    print l
    

No error and you get a print out!

Thus the safe way in Python 2.x is to make sure you only pass raw text to os.walk(). You absolutely should not pass unicode or things which might be unicode to it, because os.walk will then choke when an internal ascii conversion fails.

Leave a Comment