Converting pandas column of comma-separated strings into dummy variables

Use str.get_dummies

df['col'].str.get_dummies(sep=',')

    a   b   c   d
0   1   0   0   0
1   1   1   1   0
2   1   1   0   1
3   0   0   0   1
4   0   0   1   1

Edit: Updating the answer to address some questions.

Qn 1: Why is it that the series method get_dummies does not accept the argument prefix=… while pandas.get_dummies() does accept it

Series.str.get_dummies is a series level method (as the name suggests!). We are one hot encoding values in one Series (or a DataFrame column) and hence there is no need to use prefix. Pandas.get_dummies on the other hand can one hot encode multiple columns. In which case, the prefix parameter works as an identifier of the original column.

If you want to apply prefix to str.get_dummies, you can always use DataFrame.add_prefix

df['col'].str.get_dummies(sep=',').add_prefix('col_')

Qn 2: If you have more than one column to begin with, how do you merge the dummies back into the original frame?
You can use DataFrame.concat to merge one hot encoded columns with the rest of the columns in dataframe.

df = pd.DataFrame({'other':['x','y','x','x','q'],'col':['a','a,b,c','a,b,d','d','c,d']})
df = pd.concat([df, df['col'].str.get_dummies(sep=',')], axis = 1).drop('col', 1)

  other a   b   c   d
0   x   1   0   0   0
1   y   1   1   1   0
2   x   1   1   0   1
3   x   0   0   0   1
4   q   0   0   1   1

Leave a Comment