Why doesn’t Python’s mmap work with large files?

From IEEE 1003.1:

The mmap() function shall establish a
mapping between a process’ address
space and a file, shared memory
object, or [TYM] typed memory
object.

It needs all the virtual address space because that’s exactly what mmap() does.

The fact that it isn’t really running out of memory doesn’t matter – you can’t map more address space than you have available. Since you then take the result and access as if it were memory, how exactly do you propose to access more than 2^32 bytes into the file? Even if mmap() didn’t fail, you could still only read the first 4GB before you ran out of space in a 32-bit address space. You can, of course, mmap() a sliding 32-bit window over the file, but that won’t necessarily net you any benefit unless you can optimize your access pattern such that you limit how many times you have to visit previous windows.

Leave a Comment