Converting PANDAS dataframe from monthly to daily

First, parse the month-datestrings into Pandas timestamps:

df['month'] = pd.to_datetime(df['month'], format="%Y-%m")
#        month ticker   b     c
# 0 2014-01-01    AAU  10  0.04
# 1 2014-02-01    AAU  20  0.03
# 2 2014-03-01    AAU  13  0.06
# 3 2014-12-01    AAU  11  0.03
# 4 2014-01-01    ZZY  11  0.11
# 5 2014-02-01    ZZY   6  0.03
# 6 2014-12-01    ZZY  17  0.09

Next, pivot the DataFrame, using the month as the index and the ticker as a column level:

df = df.pivot(index='month', columns="ticker")
#              b         c      
# ticker     AAU ZZY   AAU   ZZY
# month                         
# 2014-01-01  10  11  0.04  0.11
# 2014-02-01  20   6  0.03  0.03
# 2014-03-01  13 NaN  0.06   NaN
# 2014-12-01  11  17  0.03  0.09

By pivoting now, we will be able to forward-fill each column more easily later.

Now find the start and end dates:

start_date = df.index.min() - pd.DateOffset(day=1)
end_date = df.index.max() + pd.DateOffset(day=31)

Interestingly, note that adding pd.DateOffset(day=31) will not always result in a date that ends on day 31. If the month is February, adding pd.DateOffset(day=31) returns the last day in February:

In [130]: pd.Timestamp('2014-2-28') + pd.DateOffset(day=31)
Out[130]: Timestamp('2014-02-28 00:00:00')

That’s nice, since that means adding pd.DateOffset(day=31) will always give us the last valid day in the month.

Now we can reindex and forward-fill the DataFrame:

dates = pd.date_range(start_date, end_date, freq='D')
dates.name="date"
df = df.reindex(dates, method='ffill')

which yields

In [160]: df.head()
Out[160]: 
             b         c      
ticker     AAU ZZY   AAU   ZZY
date                          
2014-01-01  10  11  0.04  0.11
2014-01-02  10  11  0.04  0.11
2014-01-03  10  11  0.04  0.11
2014-01-04  10  11  0.04  0.11
2014-01-05  10  11  0.04  0.11

In [161]: df.tail()
Out[161]: 
             b         c      
ticker     AAU ZZY   AAU   ZZY
date                          
2014-12-27  11  17  0.03  0.09
2014-12-28  11  17  0.03  0.09
2014-12-29  11  17  0.03  0.09
2014-12-30  11  17  0.03  0.09
2014-12-31  11  17  0.03  0.09

To move the ticker out of the column index and back into a column:

df = df.stack('ticker')
df = df.sortlevel(level=1)
df = df.reset_index()

So putting it all together:

import pandas as pd
df = pd.read_table('data', sep='\s+')
df['month'] = pd.to_datetime(df['month'], format="%Y-%m")
df = df.pivot(index='month', columns="ticker")

start_date = df.index.min() - pd.DateOffset(day=1)
end_date = df.index.max() + pd.DateOffset(day=31)
dates = pd.date_range(start_date, end_date, freq='D')
dates.name="date"
df = df.reindex(dates, method='ffill')

df = df.stack('ticker')
df = df.sortlevel(level=1)
df = df.reset_index()

yields

In [163]: df.head()
Out[163]: 
        date ticker   b     c
0 2014-01-01    AAU  10  0.04
1 2014-01-02    AAU  10  0.04
2 2014-01-03    AAU  10  0.04
3 2014-01-04    AAU  10  0.04
4 2014-01-05    AAU  10  0.04

In [164]: df.tail()
Out[164]: 
          date ticker   b     c
450 2014-12-27    ZZY  17  0.09
451 2014-12-28    ZZY  17  0.09
452 2014-12-29    ZZY  17  0.09
453 2014-12-30    ZZY  17  0.09
454 2014-12-31    ZZY  17  0.09

Leave a Comment