Use groupby `apply` and return a Series to rename columns

Use the groupby apply method to perform an aggregation that

Renames the columns
Allows for spaces in the names
Allows you to order the returned columns in any way you choose
Allows for interactions between columns
Returns a single level index and NOT a MultiIndex

To do this:

create a custom function that you pass to apply
This custom function is passed each group as a DataFrame
Return a Series
The index of the Series will be the new columns

Create fake data

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
                  'Score': [9, 1, 8, 7, 7, 6, 9]})

create custom function that returns a Series
The variable x inside of my_agg is a DataFrame

def my_agg(x):
    names = {
        'Amount mean': x['Amount'].mean(),
        'Amount std':  x['Amount'].std(),
        'Amount range': x['Amount'].max() - x['Amount'].min(),
        'Score Max':  x['Score'].max(),
        'Score Sum': x['Score'].sum(),
        'Amount Score Sum': (x['Amount'] * x['Score']).sum()}

    return pd.Series(names, index=['Amount range', 'Amount std', 'Amount mean',
                                   'Score Sum', 'Score Max', 'Amount Score Sum'])

Pass this custom function to the groupby apply method

df.groupby('User').apply(my_agg)

The big downside is that this function will be much slower than agg for the cythonized aggregations

Using a dictionary with groupby `agg` method

Using a dictionary of dictionaries was removed because of its complexity and somewhat ambiguous nature. There is an ongoing discussion on how to improve this functionality in the future on github Here, you can directly access the aggregating column after the groupby call. Simply pass a list of all the aggregating functions you wish to apply.

df.groupby('User')['Amount'].agg(['sum', 'count'])

Output

       sum  count
User              
user1  18.0      2
user2  20.5      3
user3  10.5      1

It is still possible to use a dictionary to explicitly denote different aggregations for different columns, like here if there was another numeric column named Other.

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
              "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0],
              'Other': [1,2,3,4,5,6]})

df.groupby('User').agg({'Amount' : ['sum', 'count'], 'Other':['max', 'std']})

Output

      Amount       Other          
         sum count   max       std
User                              
user1   18.0     2     6  3.535534
user2   20.5     3     5  1.527525
user3   10.5     1     4       NaN

Rename result columns from Pandas aggregation (“FutureWarning: using a dict with renaming is deprecated”)

Use groupby `apply` and return a Series to rename columns

Using a dictionary with groupby `agg` method

Leave a Comment Cancel reply

Use groupby apply and return a Series to rename columns

Using a dictionary with groupby agg method

More Related Contents:

Leave a Comment Cancel reply

Use groupby `apply` and return a Series to rename columns

Using a dictionary with groupby `agg` method