How to read lines from a file in python starting from the end

The general approach to this problem, reading a text file in reverse, line-wise, can be solved by at least three methods.

The general problem is that since each line can have a different length, you can’t know beforehand where each line starts in the file, nor how many of them there are. This means you need to apply some logic to the problem.

General approach #1: Read the entire file into memory

With this approach, you simply read the entire file into memory, in some data structure that subsequently allows you to process the list of lines in reverse. A stack, a doubly linked list, or even an array can do this.

Pros: Really easy to implement (probably built into Python for all I know)
Cons: Uses a lot of memory, can take a while to read large files

General approach #2: Read the entire file, store position of lines

With this approach, you also read through the entire file once, but instead of storing the entire file (all the text) in memory, you only store the binary positions inside the file where each line started. You can store these positions in a similar data structure as the one storing the lines in the first approach.

Whever you want to read line X, you have to re-read the line from the file, starting at the position you stored for the start of that line.

Pros: Almost as easy to implement as the first approach
Cons: can take a while to read large files

General approach #3: Read the file in reverse, and “figure it out”

With this approach you will read the file block-wise or similar, from the end, and see where the ends are. You basically have a buffer, of say, 4096 bytes, and process the last line of that buffer. When your processing, which has to move one line at a time backward in that buffer, comes to the start of the buffer, you need to read another buffer worth of data, from the area before the first buffer you read, and continue processing.

This approach is generally more complicated, because you need to handle such things as lines being broken over two buffers, and long lines could even cover more than two buffers.

It is, however, the one that would require the least amount of memory, and for really large files, it might also be worth doing this to avoid reading through gigabytes of information first.

Pros: Uses little memory, does not require you to read the entire file first
Cons: Much hard to implement and get right for all corner cases


There are numerous links on the net that shows how to do the third approach:

Leave a Comment