best way to preserve numpy arrays on disk

I’ve compared performance (space and time) for a number of ways to store numpy arrays. Few of them support multiple arrays per file, but perhaps it’s useful anyway.

benchmark for numpy array storage

Npy and binary files are both really fast and small for dense data. If the data is sparse or very structured, you might want to use npz with compression, which’ll save a lot of space but cost some load time.

If portability is an issue, binary is better than npy. If human readability is important, then you’ll have to sacrifice a lot of performance, but it can be achieved fairly well using csv (which is also very portable of course).

More details and the code are available at the github repo.

Leave a Comment