Re: [R] factor documentation issue

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Wed 28 Feb 2007 - 05:56:35 GMT

Geoff Russell wrote:
> There is a warning in the documentation for ?factor (R version 2.3.0)
> as follows:
>
> " The interpretation of a factor depends on both the codes and the
> '"levels"' attribute. Be careful only to compare factors with the
> same set of levels (in the same order). In particular,
> 'as.numeric' applied to a factor is meaningless, and may happen by
> implicit coercion. To "revert" a factor 'f' to its original
> numeric values, 'as.numeric(levels(f))[f]' is recommended and
> slightly more efficient than 'as.numeric(as.character(f))'.
>
>
> But as.numeric seems to work fine whereas as.numeric(levels(f))[f] doesn't
> always do anything useful.
>
> For example:
>
>
>> f<-factor(1:3,labels=c("A","B","C"))
>> f
>>
> [1] A B C
> Levels: A B C
>
>> as.numeric(f)
>>
> [1] 1 2 3
>
>> as.numeric(levels(f))[f]
>>
> [1] NA NA NA
> Warning message:
> NAs introduced by coercion
>
> And also,
>
>
>> f<-factor(1:3,labels=c(1,5,6))
>> f
>>
> [1] 1 5 6
> Levels: 1 5 6
>
>> as.numeric(f)
>>
> [1] 1 2 3
>
>> as.numeric(levels(f))[f]
>>
> [1] 1 5 6
>
> Is the documentation wrong, or is the code wrong, or have I missed
> something?
>

The documentation is somewhat unclear: The last sentence presupposes that the factor was generated from numeric data, i.e. the factor(c(7,9,13)) syndrome:

 > f <- factor (c(7,9,13))
 > f
[1] 7 9 13
Levels: 7 9 13
 > as.numeric(f)
[1] 1 2 3

Also, the statement that as.numeric(f) is meaningless is a bit strong. Probably should say "meaningless without knowledge of the levels and their order". And you can actually compare factors with their levels in different order:

 > g <- factor (c("7",9,13))
 > g
[1] 7 9 13
Levels: 13 7 9
 > f==g
[1] TRUE TRUE TRUE
 > as.numeric(f)==as.numeric(g)
[1] FALSE FALSE FALSE Where you need to be careful is that if you do things like

   sexsymbols <- c(16, 19)
   plot(x, y, pch=sexsymbols[sex]),
then you should also do

   legend(x0, y0, legend=levels(sex), pch=sexsymbols) in order to be sure the symbols match the legend. (Notice that indexing with [sex] implicitly coerces sex to numeric).



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed Feb 28 17:18:51 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 28 Feb 2007 - 06:30:29 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.