file.tell() inconsistency

Using open files as an iterator uses a read-ahead buffer to increase efficiency. As a result, the file pointer advances in large steps across the file as you loop over the lines.

From the File Objects documentation:

In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the next() method uses a hidden read-ahead buffer. As a consequence of using a read-ahead buffer, combining next() with other file methods (like readline()) does not work right. However, using seek() to reposition the file to an absolute position will flush the read-ahead buffer.

If you need to rely on .tell(), don’t use the file object as an iterator. You can turn .readline() into an iterator instead (at the price of some performance loss):

for line in iter(f.readline, ''):
    print f.tell()

This uses the iter() function sentinel argument to turn any callable into an iterator.

Leave a Comment