Pandas convert a column of list to dummies

Using s for your df['groups']:

In [21]: s = pd.Series({0: ['a', 'b', 'c'], 1:['c'], 2: ['b', 'c', 'e'], 3: ['a', 'c'], 4: ['b', 'e'] })

In [22]: s
Out[22]:
0    [a, b, c]
1          [c]
2    [b, c, e]
3       [a, c]
4       [b, e]
dtype: object

This is a possible solution:

In [23]: pd.get_dummies(s.apply(pd.Series).stack()).sum(level=0)
Out[23]:
   a  b  c  e
0  1  1  1  0
1  0  0  1  0
2  0  1  1  1
3  1  0  1  0
4  0  1  0  1

The logic of this is:

  • .apply(Series) converts the series of lists to a dataframe
  • .stack() puts everything in one column again (creating a multi-level index)
  • pd.get_dummies( ) creating the dummies
  • .sum(level=0) for remerging the different rows that should be one row (by summing up the second level, only keeping the original level (level=0))

An slight equivalent is pd.get_dummies(s.apply(pd.Series), prefix='', prefix_sep='').sum(level=0, axis=1)

If this will be efficient enough, I don’t know, but in any case, if performance is important, storing lists in a dataframe is not a very good idea.

Leave a Comment