Python pandas rolling_apply two column input into function

Not sure if still relevant here, with the new rolling classes on pandas, whenever we pass raw=False to apply, we are actually passing the series to the wraper, which means we have access to the index of each observation, and can use that to further handle multiple columns.

From the docs:

raw : bool, default None

False : passes each row or column as a Series to the function.

True or None : the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance.

In this scenario, we can do the following:

### create a func for multiple columns
def cust_func(s):

    val_for_col2 = df.loc[s.index, col2] #.values
    val_for_col3 = df.loc[s.index, col3] #.values
    val_for_col4 = df.loc[s.index, col4] #.values
    
    ## apply over multiple column values
    return np.max(s) *np.min(val_for_col2)*np.max(val_for_col3)*np.mean(val_for_col4)
    

### Apply to the dataframe
df.rolling('10s')['col1'].apply(cust_func, raw=False)

Note that here we can still use all functionalities from pandas rolling class, which is particularly useful when dealing with time-related windows.

The fact that we are passing one column and using the entire dataframe feels like a hack, but it works in practice.

Leave a Comment