I think the way to think about the difference between labels
and levels
(ignoring the labels()
function that Tommy describes in his answer) is that levels
is intended to tell R which values to look for in the input (x
) and what order to use in the levels of the resulting factor
object, and labels
is to change the values of the levels after the input has been coded as a factor … as suggested by Tommy’s answer, there is no part of the factor
object returned by factor()
that is called labels
… just the levels, which have been adjusted by the labels
argument … (clear as mud).
For example:
> f <- factor(x=c("a","b","c"),levels=c("c","d","e"))
> f
[1] <NA> <NA> c
Levels: c d e
> str(f)
Factor w/ 3 levels "c","d","e": NA NA 1
Because the first two elements of x
were not found in levels
, the first two elements of f
are NA
. Because "d"
and "e"
were included in levels
, they show up in the levels of f
even though they did not occur in x
.
Now with labels
:
> f <- factor(c("a","b","c"),levels=c("c","d","e"),labels=c("C","D","E"))
> f
[1] <NA> <NA> C
Levels: C D E
After R figures out what should be in the factor, it re-codes the levels. One can of course use this to do brain-frying things such as:
> f <- factor(c("a","b","c"),levels=c("c","d","e"),labels=c("a","b","c"))
> f
[1] <NA> <NA> a
Levels: a b c
Another way to think about levels
is that factor(x,levels=L1,labels=L2)
is equivalent to
f <- factor(x,levels=L1)
levels(f) <- L2
I think an appropriately phrased version of this example might be nice for Pat Burns’s R inferno — there are plenty of factor puzzles in section 8.2, but not this particular one …