Factorize a column of strings in pandas

series

RowX    yes
RowY     no
RowW    yes
RowJ     no
RowA    yes
RowR     no
RowX    yes
RowY    yes
RowW    yes
RowJ    yes
RowA    yes
RowR     no
Name: Column 3, dtype: object

pd.factorize

1 - series.factorize()[0]
array([1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0])
    

np.where

np.where(series == 'yes', 1, 0)
array([1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0])

pd.Categorical/astype('category')

pd.Categorical(series).codes
array([1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0], dtype=int8)
series.astype('category').cat.codes

RowX    1
RowY    0
RowW    1
RowJ    0
RowA    1
RowR    0
RowX    1
RowY    1
RowW    1
RowJ    1
RowA    1
RowR    0
dtype: int8

pd.Series.replace

series.replace({'yes' : 1, 'no' : 0})
 
RowX    1
RowY    0
RowW    1
RowJ    0
RowA    1
RowR    0
RowX    1
RowY    1
RowW    1
RowJ    1
RowA    1
RowR    0
Name: Column 3, dtype: int64

A fun, generalised version of the above:

series.replace({r'^(?!yes).*$' : 0}, regex=True).astype(bool).astype(int)

RowX    1
RowY    0
RowW    1
RowJ    0
RowA    1
RowR    0
RowX    1
RowY    1
RowW    1
RowJ    1
RowA    1
RowR    0
Name: Column 3, dtype: int64

Anything that is not "yes" is 0.

Leave a Comment