When is it appropriate to use df.value_counts() vs df.groupby(‘…’).count()?

There is difference value_counts return: The resulting object will be in descending order so that the first element is the most frequently-occurring element. but count not, it sort output by index (created by column in groupby(‘col’)). df.groupby(‘colA’).count() is for aggregate all columns of df by function count. So it count values excluding NaNs. So if … Read more

How to get number of groups in a groupby object in pandas?

Simple, Fast, and Pandaic: ngroups Newer versions of the groupby API (pandas >= 0.23) provide this (undocumented) attribute which stores the number of groups in a GroupBy object. # setup df = pd.DataFrame({‘A’: list(‘aabbcccd’)}) dfg = df.groupby(‘A’) # call `.ngroups` on the GroupBy object dfg.ngroups # 4 Note that this is different from GroupBy.groups which … Read more

concise way of flattening multiindex columns

You can do a map join with columns out.columns = out.columns.map(‘_’.join) out Out[23]: B_mean B_std C_median A 1 0.204825 0.169408 0.926347 2 0.362184 0.404272 0.224119 3 0.533502 0.380614 0.218105 For some reason (when the column contain int) I like this way better out.columns.map(‘{0[0]}_{0[1]}’.format) Out[27]: Index([‘B_mean’, ‘B_std’, ‘C_median’], dtype=”object”)

Pandas groupby and aggregation output should include all the original columns (including the ones not aggregated on)

agg with a dict of functions Create a dict of functions and pass it to agg. You’ll also need as_index=False to prevent the group columns from becoming the index in your output. f = {‘NET_AMT’: ‘sum’, ‘QTY_SOLD’: ‘sum’, ‘UPC_DSC’: ‘first’} df.groupby([‘month’, ‘UPC_ID’], as_index=False).agg(f) month UPC_ID UPC_DSC NET_AMT QTY_SOLD 0 2017.02 111 desc1 10 2 1 … Read more

get first and last values in a groupby

Option 1 def first_last(df): return df.ix[[0, -1]] df.groupby(level=0, group_keys=False).apply(first_last) Option 2 – only works if index is unique idx = df.index.to_series().groupby(level=0).agg([‘first’, ‘last’]).stack() df.loc[idx] Option 3 – per notes below, this only makes sense when there are no NAs I also abused the agg function. The code below works, but is far uglier. df.reset_index(1).groupby(level=0).agg([‘first’, ‘last’]).stack() \ … Read more

How to summarize on different groupby combinations?

Since your data seem to guarantee 3 unique crops per country (“I am compiling a table of top-3 crops by county.”), it suffices to sort the values and assign back. import numpy as np cols = [‘Crop1’, ‘Crop2’, ‘Crop3’] df1[cols] = np.sort(df1[cols].to_numpy(), axis=1) County Crop1 Crop2 Crop3 Total_pop 0 Harney apples grain melons 2000 1 … Read more