How to iterate over consecutive chunks of Pandas dataframe efficiently

Use numpy’s array_split():

import numpy as np
import pandas as pd

data = pd.DataFrame(np.random.rand(10, 3))
for chunk in np.array_split(data, 5):
  assert len(chunk) == len(data) / 5, "This assert may fail for the last chunk if data lenght isn't divisible by 5"

Leave a Comment