predict.lm() in a loop. warning: prediction from a rank-deficient fit may be misleading

You can inspect the predict function with body(predict.lm). There you will see this line: if (p < ncol(X) && !(missing(newdata) || is.null(newdata))) warning(“prediction from a rank-deficient fit may be misleading”) This warning checks if the rank of your data matrix is at least equal to the number of parameters you want to fit. One way … Read more

scipy, lognormal distribution – parameters

The distributions in scipy are coded in a generic way wrt two parameter location and scale so that location is the parameter (loc) which shifts the distribution to the left or right, while scale is the parameter which compresses or stretches the distribution. For the two parameter lognormal distribution, the “mean” and “std dev” correspond … Read more

Stepwise regression using p-values to drop variables with nonsignificant p-values

Show your boss the following : set.seed(100) x1 <- runif(100,0,1) x2 <- as.factor(sample(letters[1:3],100,replace=T)) y <- x1+x1*(x2==”a”)+2*(x2==”b”)+rnorm(100) summary(lm(y~x1*x2)) Which gives : Estimate Std. Error t value Pr(>|t|) (Intercept) -0.1525 0.3066 -0.498 0.61995 x1 1.8693 0.6045 3.092 0.00261 ** x2b 2.5149 0.4334 5.802 8.77e-08 *** x2c 0.3089 0.4475 0.690 0.49180 x1:x2b -1.1239 0.8022 -1.401 0.16451 x1:x2c -1.0497 … Read more

Standard Deviation in R Seems to be Returning the Wrong Answer – Am I Doing Something Wrong?

Try this R> sd(c(2,4,4,4,5,5,7,9)) * sqrt(7/8) [1] 2 R> and see the rest of the Wikipedia article for the discussion about estimation of standard deviations. Using the formula employed ‘by hand’ leads to a biased estimate, hence the correction of sqrt((N-1)/N). Here is a key quote: The term standard deviation of the sample is used … Read more

Pythonic way of detecting outliers in one dimensional observation data

The problem with using percentile is that the points identified as outliers is a function of your sample size. There are a huge number of ways to test for outliers, and you should give some thought to how you classify them. Ideally, you should use a-priori information (e.g. “anything above/below this value is unrealistic because…”) … Read more

Interpretation of ordered and non-ordered factors, vs. numerical predictors in model summary

This is not really a mixed-model specific question, but rather a general question about model parameterization in R. Let’s try a simple example. set.seed(101) d <- data.frame(x=sample(1:4,size=30,replace=TRUE)) d$y <- rnorm(30,1+2*d$x,sd=0.01) x as numeric This just does a linear regression: the x parameter denotes the change in y per unit of change in x; the intercept … Read more