Scikit-learn’s LabelBinarizer vs. OneHotEncoder

A simple example which encodes an array using LabelEncoder, OneHotEncoder, LabelBinarizer is shown below.

I see that OneHotEncoder needs data in integer encoded form first to convert into its respective encoding which is not required in the case of LabelBinarizer.

from numpy import array
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import LabelBinarizer

# define example
data = ['cold', 'cold', 'warm', 'cold', 'hot', 'hot', 'warm', 'cold', 
'warm', 'hot']
values = array(data)
print "Data: ", values
# integer encode
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(values)
print "Label Encoder:" ,integer_encoded

# onehot encode
onehot_encoder = OneHotEncoder(sparse=False)
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
print "OneHot Encoder:", onehot_encoded

#Binary encode
lb = LabelBinarizer()
print "Label Binarizer:", lb.fit_transform(values)

enter image description here

Another good link which explains the OneHotEncoder is: Explain onehotencoder using python

There may be other valid differences between the two which experts can probably explain.

Leave a Comment