I wanted to share what I’ve done to work around this problem.
Given a pd.DataFrame
and a window, I generate a stacked ndarray
using np.dstack
(see answer). I then convert it to a pd.Panel
and using pd.Panel.to_frame
convert it to a pd.DataFrame
. At this point, I have a pd.DataFrame
that has an additional level on its index relative to the original pd.DataFrame
and the new level contains information about each rolled period. For example, if the roll window is 3, the new index level will contain be [0, 1, 2]
. An item for each period. I can now groupby
level=0
and return the groupby object. This now gives me an object that I can much more intuitively manipulate.
Roll Function
import pandas as pd
import numpy as np
def roll(df, w):
roll_array = np.dstack([df.values[i:i+w, :] for i in range(len(df.index) - w + 1)]).T
panel = pd.Panel(roll_array,
items=df.index[w-1:],
major_axis=df.columns,
minor_axis=pd.Index(range(w), name="roll"))
return panel.to_frame().unstack().T.groupby(level=0)
Demonstration
np.random.seed([3,1415])
df = pd.DataFrame(np.random.rand(5, 2).round(2), columns=['A', 'B'])
print df
A B
0 0.44 0.41
1 0.46 0.47
2 0.46 0.02
3 0.85 0.82
4 0.78 0.76
Let’s sum
rolled_df = roll(df, 2)
print rolled_df.sum()
major A B
1 0.90 0.88
2 0.92 0.49
3 1.31 0.84
4 1.63 1.58
To peek under the hood, we can see the stucture:
print rolled_df.apply(lambda x: x)
major A B
roll
1 0 0.44 0.41
1 0.46 0.47
2 0 0.46 0.47
1 0.46 0.02
3 0 0.46 0.02
1 0.85 0.82
4 0 0.85 0.82
1 0.78 0.76
But what about the purpose for which I built this, rolling multi-factor regression. But I’ll settle for matrix multiplication for now.
X = np.array([2, 3])
print rolled_df.apply(lambda df: pd.Series(df.values.dot(X)))
0 1
1 2.11 2.33
2 2.33 0.98
3 0.98 4.16
4 4.16 3.84