poly() in lm(): difference between raw vs. orthogonal

By default, with raw = FALSE, poly() computes an orthogonal polynomial. It internally sets up the model matrix with the raw coding x, x^2, x^3, … first and then scales the columns so that each column is orthogonal to the previous ones. This does not change the fitted values but has the advantage that you can see whether a certain order in the polynomial significantly improves the regression over the lower orders.

Consider the simple cars data with response stopping distance and driving speed. Physically, this should have a quadratic relationship but in this (old) dataset the quadratic term is not significant:

m1 <- lm(dist ~ poly(speed, 2), data = cars)
m2 <- lm(dist ~ poly(speed, 2, raw = TRUE), data = cars)

In the orthogonal coding you get the following coefficients in summary(m1):

                Estimate Std. Error t value Pr(>|t|)    
(Intercept)       42.980      2.146  20.026  < 2e-16 ***
poly(speed, 2)1  145.552     15.176   9.591 1.21e-12 ***
poly(speed, 2)2   22.996     15.176   1.515    0.136    

This shows that there is a highly significant linear effect while the second order is not significant. The latter p-value (i.e., the one of the highest order in the polynomial) is the same as in the raw coding:

                            Estimate Std. Error t value Pr(>|t|)
(Intercept)                  2.47014   14.81716   0.167    0.868
poly(speed, 2, raw = TRUE)1  0.91329    2.03422   0.449    0.656
poly(speed, 2, raw = TRUE)2  0.09996    0.06597   1.515    0.136

but the lower order p-values change dramatically. The reason is that in model m1 the regressors are orthogonal while they are highly correlated in m2:

cor(model.matrix(m1)[, 2], model.matrix(m1)[, 3])
## [1] 4.686464e-17
cor(model.matrix(m2)[, 2], model.matrix(m2)[, 3])
## [1] 0.9794765

Thus, in the raw coding you can only interpret the p-value of speed if speed^2 remains in the model. And as both regressors are highly correlated one of them can be dropped. However, in the orthogonal coding speed^2 only captures the quadratic part that has not been captured by the linear term. And then it becomes clear that the linear part is significant while the quadratic part has no additional significance.

Leave a Comment