pandas-groupby - w3toppers.com

When is it appropriate to use df.value_counts() vs df.groupby(‘…’).count()?

There is difference value_counts return: The resulting object will be in descending order so that the first element is the most frequently-occurring element. but count not, it sort output by index (created by column in groupby(‘col’)). df.groupby(‘colA’).count() is for aggregate all columns of df by function count. So it count values excluding NaNs. So if … Read more

concise way of flattening multiindex columns

You can do a map join with columns out.columns = out.columns.map(‘_’.join) out Out[23]: B_mean B_std C_median A 1 0.204825 0.169408 0.926347 2 0.362184 0.404272 0.224119 3 0.533502 0.380614 0.218105 For some reason (when the column contain int) I like this way better out.columns.map(‘{0[0]}_{0[1]}’.format) Out[27]: Index([‘B_mean’, ‘B_std’, ‘C_median’], dtype=”object”)

What is the equivalent of SQL “GROUP BY HAVING” on Pandas?

As mentioned in unutbu’s comment, groupby’s filter is the equivalent of SQL’S HAVING: In [11]: df = pd.DataFrame([[1, 2], [1, 3], [5, 6]], columns=[‘A’, ‘B’]) In [12]: df Out[12]: A B 0 1 2 1 1 3 2 5 6 In [13]: g = df.groupby(‘A’) # GROUP BY A In [14]: g.filter(lambda x: len(x) > … Read more

How to summarize on different groupby combinations?

Since your data seem to guarantee 3 unique crops per country (“I am compiling a table of top-3 crops by county.”), it suffices to sort the values and assign back. import numpy as np cols = [‘Crop1’, ‘Crop2’, ‘Crop3’] df1[cols] = np.sort(df1[cols].to_numpy(), axis=1) County Crop1 Crop2 Crop3 Total_pop 0 Harney apples grain melons 2000 1 … Read more

pandas group by and assign a group id then ungroup

By using ngroup df[‘grpId’]=df.groupby(‘ socialmedia’).ngroup().add(1) df Out[354]: id socialmedia grpId 0 1 facebook 1 1 2 facebook 1 2 3 google 2 3 4 google 2 4 5 google 2 5 6 twitter 4 6 7 google 2 7 8 twitter 4 8 9 snapchat 3 9 10 twitter 4 10 11 facebook 1 Or … Read more