Recoding variables with R

Recoding can mean a lot of things, and is fundamentally complicated.

Changing the levels of a factor can be done using the levels function:

> #change the levels of a factor
> levels(veteran$celltype) <- c("s","sc","a","l")

Transforming a continuous variable simply involves the application of a vectorized function:

> mtcars$mpg.log <- log(mtcars$mpg) 

For binning continuous data look at cut and cut2 (in the hmisc package). For example:

> #make 4 groups with equal sample sizes
> mtcars[['mpg.tr']] <- cut2(mtcars[['mpg']], g=4)
> #make 4 groups with equal bin width
> mtcars[['mpg.tr2']] <- cut(mtcars[['mpg']],4, include.lowest=TRUE)

For recoding continuous or factor variables into a categorical variable there is recode in the car package and recode.variables in the Deducer package

> mtcars[c("mpg.tr2")] <- recode.variables(mtcars[c("mpg")] , "Lo:14 -> 'low';14:24 -> 'mid';else -> 'high';")

If you are looking for a GUI, Deducer implements recoding with the Transform and Recode dialogs:

http://www.deducer.org/pmwiki/pmwiki.php?n=Main.TransformVariables

http://www.deducer.org/pmwiki/pmwiki.php?n=Main.RecodeVariables

Leave a Comment