Saving StandardScaler() model for use on new datasets

you could use joblib dump function to save the standard scaler model. Here’s a complete example for reference. from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split from sklearn.datasets import load_iris data, target = load_iris(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(data, target) sc = StandardScaler() X_train_std = sc.fit_transform(X_train) if you want to save the sc standardscaller … Read more

Under what parameters are SVC and LinearSVC in scikit-learn equivalent?

In mathematical sense you need to set: SVC(kernel=”linear”, **kwargs) # by default it uses RBF kernel and LinearSVC(loss=”hinge”, **kwargs) # by default it uses squared hinge loss Another element, which cannot be easily fixed is increasing intercept_scaling in LinearSVC, as in this implementation bias is regularized (which is not true in SVC nor should be … Read more

Model help using Scikit-learn when using GridSearch

GridSearchCV as @Gauthier Feuillen said is used to search best parameters of an estimator for given data. Description of GridSearchCV:- gcv = GridSearchCV(pipe, clf_params,cv=cv) gcv.fit(features,labels) clf_params will be expanded to get all possible combinations separate using ParameterGrid. features will now be split into features_train and features_test using cv. Same for labels Now the gridSearch estimator … Read more

How should I teach machine learning algorithm using data with big disproportion of classes? (SVM)

The most basic approach here is to use so called “class weighting scheme” – in classical SVM formulation there is a C parameter used to control the missclassification count. It can be changed into C1 and C2 parameters used for class 1 and 2 respectively. The most common choice of C1 and C2 for a … Read more