TfidfVectorizer in scikit-learn : ValueError: np.nan is an invalid document

You need to convert the dtype object to unicode string as is clearly mentioned in the traceback.

x = v.fit_transform(df['Review'].values.astype('U'))  ## Even astype(str) would work

From the Doc page of TFIDF Vectorizer:

fit_transform(raw_documents, y=None)

Parameters: raw_documents : iterable
an iterable which yields either str, unicode or file objects

Leave a Comment