bodo - w3toppers.com

Parallelize apply after pandas groupby

This seems to work, although it really should be built in to pandas import pandas as pd from joblib import Parallel, delayed import multiprocessing def tmpFunc(df): df[‘c’] = df.a + df.b return df def applyParallel(dfGrouped, func): retLst = Parallel(n_jobs=multiprocessing.cpu_count())(delayed(func)(group) for name, group in dfGrouped) return pd.concat(retLst) if __name__ == ‘__main__’: df = pd.DataFrame({‘a’: [6, 2, … Read more