Large, persistent DataFrame in pandas

Wes is of course right! I’m just chiming in to provide a little more complete example code. I had the same issue with a 129 Mb file, which was solved by:

import pandas as pd

tp = pd.read_csv('large_dataset.csv', iterator=True, chunksize=1000)  # gives TextFileReader, which is iterable with chunks of 1000 rows.
df = pd.concat(tp, ignore_index=True)  # df is DataFrame. If errors, do `list(tp)` instead of `tp`

Leave a Comment