Scikit learn – fit_transform on the test set

You are not supposed to do fit_transform on your test data, but only transform. Otherwise, you will get different vectorization than the one used during training. For the memory issue, I recommend TfIdfVectorizer, which has numerous options of reducing the dimensionality (by removing rare unigrams etc.). UPDATE If the only problem is fitting test data, … Read more

ROC for multiclass classification

As people mentioned in comments you have to convert your problem into binary by using OneVsAll approach, so you’ll have n_class number of ROC curves. A simple example: from sklearn.metrics import roc_curve, auc from sklearn import datasets from sklearn.multiclass import OneVsRestClassifier from sklearn.svm import LinearSVC from sklearn.preprocessing import label_binarize from sklearn.model_selection import train_test_split import matplotlib.pyplot … Read more

How can I plot a confusion matrix? [duplicate]

you can use plt.matshow() instead of plt.imshow() or you can use seaborn module’s heatmap (see documentation) to plot the confusion matrix import seaborn as sn import pandas as pd import matplotlib.pyplot as plt array = [[33,2,0,0,0,0,0,0,0,1,3], [3,31,0,0,0,0,0,0,0,0,0], [0,4,41,0,0,0,0,0,0,0,1], [0,1,0,30,0,6,0,0,0,0,1], [0,0,0,0,38,10,0,0,0,0,0], [0,0,0,3,1,39,0,0,0,0,4], [0,2,2,0,4,1,31,0,0,0,2], [0,1,0,0,0,0,0,36,0,2,0], [0,0,0,0,0,0,1,5,37,5,1], [3,0,0,0,0,0,0,0,0,39,0], [0,0,0,0,0,0,0,0,0,0,38]] df_cm = pd.DataFrame(array, index = [i for i in … Read more