Split pandas dataframe based on groupby
gb = df.groupby(‘ZZ’) [gb.get_group(x) for x in gb.groups]
gb = df.groupby(‘ZZ’) [gb.get_group(x) for x in gb.groups]
I use groupby and size df.groupby([‘id’, ‘group’, ‘term’]).size().unstack(fill_value=0) Timing 1,000,000 rows df = pd.DataFrame(dict(id=np.random.choice(100, 1000000), group=np.random.choice(20, 1000000), term=np.random.choice(10, 1000000)))
Alternatively you can do it this way: In [48]: df.groupby(‘col’)[‘val’].agg(‘-‘.join) Out[48]: col A Cat-Tiger B Ball-Bat Name: val, dtype: object UPDATE: answering question from the comment: In [2]: df Out[2]: col val 0 A Cat 1 A Tiger 2 A Panda 3 B Ball 4 B Bat 5 B Mouse 6 B Egg In [3]: … Read more
Question 1 How can I perform aggregation with Pandas? Expanded aggregation documentation. Aggregating functions are the ones that reduce the dimension of the returned objects. It means output Series/DataFrame have less or same rows like original. Some common aggregating functions are tabulated below: Function Description mean() Compute mean of groups sum() Compute sum of group … Read more
You can groupby the ‘name’ and ‘month’ columns, then call transform which will return data aligned to the original df and apply a lambda where we join the text entries: In [119]: df[‘text’] = df[[‘name’,’text’,’month’]].groupby([‘name’,’month’])[‘text’].transform(lambda x: ‘,’.join(x)) df[[‘name’,’text’,’month’]].drop_duplicates() Out[119]: name text month 0 name1 hej,du 11 2 name1 aj,oj 12 4 name2 fin,katt 11 6 … Read more
You want to use transform this will return a Series with the index aligned to the df so you can then add it as a new column: In [74]: df = pd.DataFrame({‘Date’: [‘2015-05-08’, ‘2015-05-07’, ‘2015-05-06’, ‘2015-05-05’, ‘2015-05-08’, ‘2015-05-07’, ‘2015-05-06’, ‘2015-05-05’], ‘Sym’: [‘aapl’, ‘aapl’, ‘aapl’, ‘aapl’, ‘aaww’, ‘aaww’, ‘aaww’, ‘aaww’], ‘Data2’: [11, 8, 10, 15, 110, … Read more
Quick Answer: The simplest way to get row counts per group is by calling .size(), which returns a Series: df.groupby([‘col1′,’col2’]).size() Usually you want this result as a DataFrame (instead of a Series) so you can do: df.groupby([‘col1’, ‘col2’]).size().reset_index(name=”counts”) If you want to find out how to calculate the row counts and other statistics for each … Read more
You can do this using groupby to group on the column of interest and then apply list to every group: In [1]: df = pd.DataFrame( {‘a’:[‘A’,’A’,’B’,’B’,’B’,’C’], ‘b’:[1,2,5,5,4,6]}) df Out[1]: a b 0 A 1 1 A 2 2 B 5 3 B 5 4 B 4 5 C 6 In [2]: df.groupby(‘a’)[‘b’].apply(list) Out[2]: a A … Read more
In [1]: df Out[1]: Sp Mt Value count 0 MM1 S1 a 3 1 MM1 S1 n 2 2 MM1 S3 cb 5 3 MM2 S3 mk 8 4 MM2 S4 bg 10 5 MM2 S4 dgd 1 6 MM4 S2 rd 2 7 MM4 S2 cb 2 8 MM4 S2 uyi 7 In [2]: … Read more
We start by answering the first question: Question 1 Why do I get ValueError: Index contains duplicate entries, cannot reshape This occurs because pandas is attempting to reindex either a columns or index object with duplicate entries. There are varying methods to use that can perform a pivot. Some of them are not well suited … Read more