Enumerate each row for each group in a DataFrame

There’s cumcount, for precisely this case:

df['col_c'] = g.cumcount()

As it says in the docs:

Number each item in each group from 0 to the length of that group – 1.

Original answer (before cumcount was defined).

You could create a helper function to do this:

def add_col_c(x):
    x['col_c'] = np.arange(len(x))
    return x

First sort by column col_a:

In [11]: df.sort('col_a', inplace=True)

then apply this function across each group:

In [12]: g = df.groupby('col_a', as_index=False)

In [13]: g.apply(add_col_c)
Out[13]:
  col_a  col_b  col_c
3     A      3      0
8     A      8      1
0     A      0      2
4     A      4      3
6     B      6      0
1     B      1      1
7     B      7      2
9     C      9      0
2     C      2      1
5     C      5      2

In order to get 1,2,... you couls use np.arange(1, len(x) + 1).

More Related Contents:

Leave a Comment Cancel reply