Pivoting a Pandas Dataframe containing strings – ‘No numeric types to aggregate’ error

The default aggfunc in pivot_table is np.sum and it doesn’t know what to do with strings and you haven’t indicated what the index should be properly. Trying something like:

pivot_table = unified_df.pivot_table(index=['id', 'contact_id'],
                                     columns="question", 
                                     values="response_answer",
                                     aggfunc=lambda x: ' '.join(x))

This explicitly sets one row per id, contact_id pair and pivots the set of response_answer values on question. The aggfunc just assures that if you have multiple answers to the same question in the raw data that we just concatenate them together with spaces. The syntax of pivot_table might vary depending on your pandas version.

Here’s a quick example:

In [24]: import pandas as pd

In [25]: import random

In [26]: df = pd.DataFrame({'id':[100*random.randint(10, 50) for _ in range(100)], 'question': [str(random.randint(0,3)) for _ in range(100)], 'response': [str(random.randint(100,120)) for _ in range(100)]})

In [27]: df.head()
Out[27]:
     id question response
0  3100        1      116
1  4500        2      113
2  5000        1      120
3  3900        2      103
4  4300        0      117

In [28]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 0 to 99
Data columns (total 3 columns):
id          100 non-null int64
question    100 non-null object
response    100 non-null object
dtypes: int64(1), object(2)
memory usage: 3.1+ KB

In [29]: df.pivot_table(index='id', columns="question", values="response", aggfunc=lambda x: ' '.join(x)).head()
Out[29]:
question        0        1    2        3
id
1000      110 120      NaN  100      NaN
1100          NaN  106 108  104      NaN
1200      104 113      119  NaN      101
1300          102      NaN  116  108 120
1400          NaN      NaN  116      NaN

Leave a Comment