How to get most informative features for scikit-learn classifier for different class?

In the case of binary classification, it seems like the coefficient array has been flatten. Let’s try to relabel our data with only two labels: import codecs, re, time from itertools import chain import numpy as np from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB trainfile=”train.txt” # Vectorizing data. train = [] word_vectorizer = CountVectorizer(analyzer=”word”) … Read more

Pyinstaller ; ModuleNotFoundError: No module named ‘sklearn.utils._cython_blas’

PyInstaller uses a hook mechanism for each Python module, but sometimes it misses some internal packages so you need to provide them manually. You can use –hidden-import to add sklearn‘s missing modules. pyinstaller -F –hidden-import=”sklearn.utils._cython_blas” –hidden-import=”sklearn.neighbors.typedefs” –hidden-import=”sklearn.neighbors.quad_tree” –hidden-import=”sklearn.tree._utils” Datamanager.py

How to get Predictions with XGBoost and XGBoost using Scikit-Learn Wrapper to match?

Please look at this answer here xgboost.train will ignore parameter n_estimators, while xgboost.XGBRegressor accepts. In xgboost.train, boosting iterations (i.e. n_estimators) is controlled by num_boost_round(default: 10) It suggests to remove n_estimators from params supplied to xgb.train and replace it with num_boost_round. So change your params like this: params = {‘objective’: ‘reg:linear’, ‘max_depth’: 2, ‘learning_rate’: .1, ‘min_child_weight’: … Read more

Scikit learn – fit_transform on the test set

You are not supposed to do fit_transform on your test data, but only transform. Otherwise, you will get different vectorization than the one used during training. For the memory issue, I recommend TfIdfVectorizer, which has numerous options of reducing the dimensionality (by removing rare unigrams etc.). UPDATE If the only problem is fitting test data, … Read more

Scikits-Learn RandomForrest trained on 64bit python wont open on 32bit python

This occurs because the random forest code uses different types for indices on 32-bit and 64-bit machines. This can, unfortunately, only be fixed by overhauling the random forests code. Since several scikit-learn devs are working on that anyway, I put it on the todo list. For now, the training and testing machines need to have … Read more