Yes: use Python 3.5 (which is still currently a RC, but should be out momentarily). In Python 3.5, os.walk
was rewritten to be more efficient.
This work done as part of PEP 471.
Extracted from the PEP:
Python’s built-in
os.walk()
is significantly slower than it needs to
be, because — in addition to callingos.listdir()
on each directory
— it executes thestat()
system call orGetFileAttributes()
on each file to determine whether the entry is a directory or not.But the underlying system calls —
FindFirstFile
/FindNextFile
on
Windows andreaddir
on POSIX systems — already tell you whether the
files returned are directories or not, so no further system calls are
needed. Further, the Windows system calls return all the information
for astat_result
object on the directory entry, such as file size and
last modification time.In short, you can reduce the number of system calls required for a
tree function likeos.walk()
from approximately 2N to N, where N is
the total number of files and directories in the tree. (And because
directory trees are usually wider than they are deep, it’s often much
better than this.)In practice, removing all those extra system calls makes
os.walk()
about 8-9 times as fast on Windows, and about 2-3 times as fast on
POSIX systems. So we’re not talking about micro-optimizations. See
more benchmarks here.