How to use reference variables by character string in a formula?

I see a couple issues going on here. First, and I don’t think this is causing any trouble, but let’s make your data frame in one step so you don’t have v1 through v4 floating around both in the global environment as well as in the data frame. Second, let’s just make v2 a factor here so that we won’t have to deal with making it a factor later.

dat <- data.frame(v1 = rnorm(10),
                  v2 = factor(sample(c(0,1), 10, replace=TRUE)),
                  v3 = rnorm(10),
                  v4 = rnorm(10) )

Part One Now, for your first part, it looks like this is what you want:

lm(v1 ~ v2 + v3 + v4, data=dat)

Here’s a simpler way to do that, though you still have to specify the response variable.

lm(v1 ~ ., data=dat)

Alternatively, you certainly can build up the function with paste and call lm on it.

f <- paste(names(dat)[1], "~", paste(names(dat)[-1], collapse=" + "))
# "v1 ~ v2 + v3 + v4"
lm(f, data=dat)

However, my preference in these situations is to use do.call, which evaluates expressions before passing them to the function; this makes the resulting object more suitable for calling functions like update on. Compare the call part of the output.

do.call("lm", list(as.formula(f), data=as.name("dat")))

Part Two About your second part, it looks like this is what you’re going for:

lm(factor(v2) + v3 + v4 + v2*v3 + v2*v4, data=dat)

First, because v2 is a factor in the data frame, we don’t need that part, and secondly, this can be simplified further by better using R’s methods for using arithmetical operations to create interactions, like this.

lm(v1 ~ v2*(v3 + v4), data=dat)

I’d then simply create the function using paste; the loop with assign, even in the larger case, is probably not a good idea.

f <- paste(names(dat)[1], "~", names(dat)[2], "* (", 
           paste(names(dat)[-c(1:2)], collapse=" + "), ")")
# "v1 ~ v2 * ( v3 + v4 )"

It can then be called using either lm directly or with do.call.

lm(f, data=dat)
do.call("lm", list(as.formula(f), data=as.name("dat")))

About your code The problem you had with trying to use r3 etc was that you wanted the contents of the variable r3, not the value r3. To get the value, you need get, like this, and then you’d collapse the values together with paste.

vars <- sapply(paste0("r", 3:6), get)
paste(vars, collapse=" + ")

However, a better way would be to avoid assign and just build a vector of the terms you want, like this.

vars <- NULL
for (v in 3:4) {
  vars <- c(vars, colnames(dat)[v], paste(colnames(dat)[2], 
                                          colnames(dat)[v], sep="*"))
}
paste(vars, collapse=" + ")

A more R-like solution would be to use lapply:

vars <- unlist(lapply(colnames(dat)[3:4], 
                      function(x) c(x, paste(colnames(dat)[2], x, sep="*"))))

Leave a Comment