pandas-groupby - w3toppers.com

Pandas Groupby / List to Multiple Rows

IIUC, I think you can do it like this: dfg = df.groupby([‘AccountID’, ‘Last Name’, df.groupby([‘AccountID’, ‘Last Name’]).cumcount() + 1]).first().unstack() dfg.columns = [f'{i}{j}’ for i, j in dfg.columns] df_out = dfg.sort_index(axis=1, key=lambda x: x.str[-1]) df_out.reset_index() Output: AccountID Last Name Contract1 First Name1 Address1 City1 State1 Contract2 First Name2 Address2 City2 State2 Contract3 First Name3 Address3 City3 … Read more

Identifying statistical outliers with pandas: groupby and reduce rows into different dataframe

get the mean and std. We need to loop over each column, get the mean and std, then set the max and min value we accept for this column. # Storring mean and std for every col as a tuple, 0 index for max value, # and 1 for min value outliers = [] for … Read more

What is the difference between pandas agg and apply function?

apply applies the function to each group (your Species). Your function returns 1, so you end up with 1 value for each of 3 groups. agg aggregates each column (feature) for each group, so you end up with one value per column per group. Do read the groupby docs, they’re quite helpful. There are also … Read more

Pandas get frequency of item occurrences in a column as percentage [duplicate]

Use value_counts with normalize=True: df[‘gender’].value_counts(normalize=True) * 100 The result is a fraction in range (0, 1]. We multiply by 100 here in order to get the %.

Python Pandas Conditional Sum with Groupby

First groupby the key1 column: In [11]: g = df.groupby(‘key1′) and then for each group take the subDataFrame where key2 equals ‘one’ and sum the data1 column: In [12]: g.apply(lambda x: x[x[‘key2’] == ‘one’][‘data1′].sum()) Out[12]: key1 a 0.093391 b 1.468194 dtype: float64 To explain what’s going on let’s look at the ‘a’ group: In [21]: … Read more

Python Pandas Group by date using datetime data

You can use groupby by dates of column Date_Time by dt.date: df = df.groupby([df[‘Date_Time’].dt.date]).mean() Sample: df = pd.DataFrame({‘Date_Time’: pd.date_range(’10/1/2001 10:00:00′, periods=3, freq=’10H’), ‘B’:[4,5,6]}) print (df) B Date_Time 0 4 2001-10-01 10:00:00 1 5 2001-10-01 20:00:00 2 6 2001-10-02 06:00:00 print (df[‘Date_Time’].dt.date) 0 2001-10-01 1 2001-10-01 2 2001-10-02 Name: Date_Time, dtype: object df = df.groupby([df[‘Date_Time’].dt.date])[‘B’].mean() print(df) … Read more

How to drop duplicates based on two or more subsets criteria in Pandas data-frame

Your syntax is wrong. Here’s the correct way: df.drop_duplicates(subset=[‘bio’, ‘center’, ‘outcome’]) Or in this specific case, just simply: df.drop_duplicates() Both return the following: bio center outcome 0 1 one f 2 1 two f 3 4 three f Take a look at the df.drop_duplicates documentation for syntax details. subset should be a sequence of column … Read more

Pandas Groupby and Sum Only One Column

The only way to do this would be to include C in your groupby (the groupby function can accept a list). Give this a try: df.groupby([‘A’,’C’])[‘B’].sum() One other thing to note, if you need to work with df after the aggregation you can also use the as_index=False option to return a dataframe object. This one … Read more

Use pandas.shift() within a group

Pandas’ grouped objects have a groupby.DataFrameGroupBy.shift method, which will shift a specified column in each group n periods, just like the regular dataframe’s shift method: df[‘prev_value’] = df.groupby(‘object’)[‘value’].shift() For the following example dataframe: print(df) object period value 0 1 1 24 1 1 2 67 2 1 4 89 3 2 4 5 4 2 … Read more