From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>

Date: Wed 28 Feb 2007 - 05:56:35 GMT

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed Feb 28 17:18:51 2007

Date: Wed 28 Feb 2007 - 05:56:35 GMT

Geoff Russell wrote:

> There is a warning in the documentation for ?factor (R version 2.3.0)

*> as follows:
**>
**> " The interpretation of a factor depends on both the codes and the
**> '"levels"' attribute. Be careful only to compare factors with the
**> same set of levels (in the same order). In particular,
**> 'as.numeric' applied to a factor is meaningless, and may happen by
**> implicit coercion. To "revert" a factor 'f' to its original
**> numeric values, 'as.numeric(levels(f))[f]' is recommended and
**> slightly more efficient than 'as.numeric(as.character(f))'.
**>
**>
**> But as.numeric seems to work fine whereas as.numeric(levels(f))[f] doesn't
**> always do anything useful.
**>
**> For example:
**>
**>
**>> f<-factor(1:3,labels=c("A","B","C"))
**>> f
**>>
**> [1] A B C
**> Levels: A B C
**>
**>> as.numeric(f)
**>>
**> [1] 1 2 3
**>
**>> as.numeric(levels(f))[f]
**>>
**> [1] NA NA NA
**> Warning message:
**> NAs introduced by coercion
**>
**> And also,
**>
**>
**>> f<-factor(1:3,labels=c(1,5,6))
**>> f
**>>
**> [1] 1 5 6
**> Levels: 1 5 6
**>
**>> as.numeric(f)
**>>
**> [1] 1 2 3
**>
**>> as.numeric(levels(f))[f]
**>>
**> [1] 1 5 6
**>
**> Is the documentation wrong, or is the code wrong, or have I missed
**> something?
**>
*

The documentation is somewhat unclear: The last sentence presupposes that the factor was generated from numeric data, i.e. the factor(c(7,9,13)) syndrome:

> f <- factor (c(7,9,13))

> f

[1] 7 9 13

Levels: 7 9 13

> as.numeric(f)

[1] 1 2 3

Also, the statement that as.numeric(f) is meaningless is a bit strong. Probably should say "meaningless without knowledge of the levels and their order". And you can actually compare factors with their levels in different order:

> g <- factor (c("7",9,13))

> g

[1] 7 9 13

Levels: 13 7 9

> f==g

**[1] TRUE TRUE TRUE
**

> as.numeric(f)==as.numeric(g)

**[1] FALSE FALSE FALSE
**
Where you need to be careful is that if you do things like

sexsymbols <- c(16, 19)

plot(x, y, pch=sexsymbols[sex]),

then you should also do

legend(x0, y0, legend=levels(sex), pch=sexsymbols) in order to be sure the symbols match the legend. (Notice that indexing with [sex] implicitly coerces sex to numeric).

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed Feb 28 17:18:51 2007

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Wed 28 Feb 2007 - 06:30:29 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*