Date: Wed 28 Feb 2007

Geoff Russell wrote:

> There is a warning in the documentation for ?factor (R version 2.3.0)

**> " The interpretation of a factor depends on both the codes and the
**> '"levels"' attribute. Be careful only to compare factors with the
**> same set of levels (in the same order). In particular,
**> 'as.numeric' applied to a factor is meaningless, and may happen by
**> implicit coercion. To "revert" a factor 'f' to its original
**> numeric values, 'as.numeric(levels(f))[f]' is recommended and
**> slightly more efficient than 'as.numeric(as.character(f))'.
**> But as.numeric seems to work fine whereas as.numeric(levels(f))[f] doesn't
**> always do anything useful.
**>
**>> f<-factor(1:3,labels=c("A","B","C"))
**>> f
**> [1] A B C
**> Levels: A B C
**>> as.numeric(f)
**> [1] 1 2 3
**>> as.numeric(levels(f))[f]
**>>
**> [1] NA NA NA
**> Warning message:
**> NAs introduced by coercion
**> And also,
**>
**>> f<-factor(1:3,labels=c(1,5,6))
**>> f
**> [1] 1 5 6
**> Levels: 1 5 6
**>> as.numeric(f)
**>>
**> [1] 1 2 3
**>> as.numeric(levels(f))[f]
**>>
**> [1] 1 5 6
**> Is the documentation wrong, or is the code wrong, or have I missed
**> something?
The documentation is somewhat unclear: The last sentence presupposes that the factor was generated from numeric data, i.e. the factor(c(7,9,13)) syndrome:

> f <- factor (c(7,9,13))

> f

[1] 7 9 13

Levels: 7 9 13

> as.numeric(f)

[1] 1 2 3

Also, the statement that as.numeric(f) is meaningless is a bit strong. Probably should say "meaningless without knowledge of the levels and their order". And you can actually compare factors with their levels in different order:

> g <- factor (c("7",9,13))

> g

[1] 7 9 13

Levels: 13 7 9

> f==g

[1] TRUE TRUE TRUE
> as.numeric(f)==as.numeric(g)

[1] FALSE FALSE FALSE
Where you need to be careful is that if you do things like

sexsymbols <- c(16, 19)

plot(x, y, pch=sexsymbols[sex]),

then you should also do

legend(x0, y0, legend=levels(sex), pch=sexsymbols) in order to be sure the symbols match the legend. (Notice that indexing with [sex] implicitly coerces sex to numeric).

