categorical-data
OneHotEncoder categorical_features deprecated, how to transform specific column
There is actually 2 warnings : FutureWarning: The handling of integer data will change in version 0.22. Currently, the categories are determined based on the range [0, max(values)], while in the future they will be determined based on the unique values. If you want the future behaviour and silence this warning, you can specify “categories=”auto””. … Read more
Pandas DataFrame sort by categorical column but by specific class ordering
I think you need Categorical with parameter ordered=True and then sorting by sort_values works very nice: Check documentation for Categorical: Ordered Categoricals can be sorted according to the custom order of the categories and can have a min and max value. import pandas as pd df = pd.DataFrame({‘a’: [‘GOTV’, ‘Persuasion’, ‘Likely Supporter’, ‘GOTV’, ‘Persuasion’, ‘Persuasion+GOTV’]}) … Read more
One-Hot Encoding in [R] | Categorical to Dummy Variables [duplicate]
dd <- read.table(text=” RACE AGE.BELOW.21 CLASS HISPANIC 0 A ASIAN 1 A HISPANIC 1 D CAUCASIAN 1 B”, header=TRUE) with(dd, data.frame(model.matrix(~RACE-1,dd), AGE.BELOW.21,CLASS)) ## RACEASIAN RACECAUCASIAN RACEHISPANIC AGE.BELOW.21 CLASS ## 1 0 0 1 0 A ## 2 1 0 0 1 A ## 3 0 0 1 1 D ## 4 0 1 0 1 … Read more
Add extra level to factors in dataframe
The levels function accept the levels(x) <- value call. Therefore, it’s very easy to add different levels: f1 <- factor(c(“a”, “a”, NA, NA, “b”, NA, “a”, “c”, “a”, “c”, “b”)) str(f1) Factor w/ 3 levels “a”,”b”,”c”: 1 1 NA NA 2 NA 1 3 1 3 … levels(f1) <- c(levels(f1),”No Answer”) f1[is.na(f1)] <- “No Answer” … Read more
Make Frequency Histogram for Factor Variables
It seems like you want barplot(prop.table(table(animals))): However, this is not a histogram.
Scikit-learn’s LabelBinarizer vs. OneHotEncoder
A simple example which encodes an array using LabelEncoder, OneHotEncoder, LabelBinarizer is shown below. I see that OneHotEncoder needs data in integer encoded form first to convert into its respective encoding which is not required in the case of LabelBinarizer. from numpy import array from sklearn.preprocessing import LabelEncoder from sklearn.preprocessing import OneHotEncoder from sklearn.preprocessing import … Read more
R error “sum not meaningful for factors”
The error comes when you try to call sum(x) and x is a factor. What that means is that one of your columns, though they look like numbers are actually factors (what you are seeing is the text representation) simple fix, convert to numeric. However, it needs an intermeidate step of converting to character first. … Read more
Interpretation of ordered and non-ordered factors, vs. numerical predictors in model summary
This is not really a mixed-model specific question, but rather a general question about model parameterization in R. Let’s try a simple example. set.seed(101) d <- data.frame(x=sample(1:4,size=30,replace=TRUE)) d$y <- rnorm(30,1+2*d$x,sd=0.01) x as numeric This just does a linear regression: the x parameter denotes the change in y per unit of change in x; the intercept … Read more
Plotting with ggplot2: “Error: Discrete value supplied to continuous scale” on categorical y-axis
As mentioned in the comments, there cannot be a continuous scale on variable of the factor type. You could change the factor to numeric as follows, just after you define the meltDF variable. meltDF$variable=as.numeric(levels(meltDF$variable))[meltDF$variable] Then, execute the ggplot command ggplot(meltDF[meltDF$value == 1,]) + geom_point(aes(x = MW, y = variable)) + scale_x_continuous(limits=c(0, 1200), breaks=c(0, 400, 800, … Read more