TfidfVectorizer in scikit-learn : ValueError: np.nan is an invalid document

You need to convert the dtype object to unicode string as is clearly mentioned in the traceback.

x = v.fit_transform(df['Review'].values.astype('U'))  ## Even astype(str) would work

From the Doc page of TFIDF Vectorizer:

fit_transform(raw_documents, y=None)

Parameters: raw_documents : iterable
an iterable which yields either str, unicode or file objects

More Related Contents:

Custom transformer for sklearn Pipeline that alters both X and y
Scikit Learn OneHotEncoder fit and transform Error: ValueError: X has different shape than during fitting
What are the pros and cons between get_dummies (Pandas) and OneHotEncoder (Scikit-learn)?
Label encoding across multiple columns in scikit-learn
Save classifier to disk in scikit-learn
How to get most informative features for scikit-learn classifiers?
Dummy variables when not all categories are present
Python: tf-idf-cosine: to find document similarity
Why is Random Forest with a single tree much better than a Decision Tree classifier?
How to get precision, recall and f-measure from confusion matrix in Python [duplicate]
How to convert a Scikit-learn dataset to a Pandas dataset
Feature/Variable importance after a PCA analysis
Recovering features names of explained_variance_ratio_ in PCA with sklearn
What is exactly sklearn.pipeline.Pipeline?
difference between StratifiedKFold and StratifiedShuffleSplit in sklearn
Save MinMaxScaler model in sklearn
confusion matrix error “Classification metrics can’t handle a mix of multilabel-indicator and multiclass targets”
Scikit-learn: How to obtain True Positive, True Negative, False Positive and False Negative
Can anyone explain me StandardScaler?
How to use sklearn fit_transform with pandas and return dataframe instead of numpy array?
Mixing categorial and continuous data in Naive Bayes classifier using scikit-learn
Tensorflow Precision / Recall / F1 score and Confusion matrix
Got continuous is not supported error in RandomForestRegressor
confused about random_state in decision tree of scikit learn
How to compute jaccard similarity from a pandas dataframe
Can I send callbacks to a KerasClassifier?
How to get most informative features for scikit-learn classifier for different class?
Model help using Scikit-learn when using GridSearch
Determining the most contributing features for SVM classifier in sklearn
scikit-learn .predict() default threshold

More Related Contents:

Leave a Comment Cancel reply