The way I usually do it is with a FeatureUnion
, using a FunctionTransformer
to pull out the relevant columns.
Important notes:
-
You have to define your functions with
def
since annoyingly you can’t uselambda
orpartial
in FunctionTransformer if you want to pickle your model -
You need to initialize
FunctionTransformer
withvalidate=False
Something like this:
from sklearn.pipeline import make_union, make_pipeline
from sklearn.preprocessing import FunctionTransformer
def get_text_cols(df):
return df[['name', 'fruit']]
def get_num_cols(df):
return df[['height','age']]
vec = make_union(*[
make_pipeline(FunctionTransformer(get_text_cols, validate=False), LabelEncoder()))),
make_pipeline(FunctionTransformer(get_num_cols, validate=False), MinMaxScaler())))
])