From: Spencer Graves <spencer.graves_at_pdf.com>

Date: Fri 15 Jul 2005 - 13:26:38 EST

Date: Fri 15 Jul 2005 - 13:26:38 EST

What's the problem? As suggested by the help page, the numeric codes are assigned in the order the names appear in the levels argument. Consider the following example from the help page plus a minor modification:

> (ff <- factor(substring("statistics", 1:10, 1:10), levels=letters))
[1] s t a t i s t i c s

Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
>

> as.numeric(ff)

[1] 19 20 1 20 9 19 20 9 3 19

> (ff <- factor(substring("statistics", 1:10, 1:10), levels=letters[3:1]))
[1] <NA> <NA> a <NA> <NA> <NA> <NA> <NA> c <NA>
Levels: c b a

> as.numeric(ff)

[1] NA NA 3 NA NA NA NA NA 1 NA

I highly recommend Venable and Ripley (2002) Modern Applied Statistics with S (Springer) and V & R (2000) S Programming (Springer).

spencer graves

Mike R wrote:

> U = c("b", "b", "b", "c", "d", "e", "e")

*>
**> F1 = factor( U, levels=c("a", "b", "c", "d", "e") )
**>
**> as.numeric(F1)
**> [1] 2 2 2 3 4 5 5
**>
**> Here, the integer code of "b" in F1 is 2
**>
**> K = factor( levels(F1) )
**> as.numeric(K)
**> [1] 1 2 3 4 5
**> K
**> [1] a b c d e
**> Levels: a b c d e
**>
**> And again, the integer code of "b" in K is 2. Great!
**>
**> I am wondering how modify that usage such that the correspondence between
**> the two numeric vectors can this be trusted. for example, the correspondence
**> can be corrupted by placing the "a" at the end:
**>
**> F2 = factor( U, levels=c("b", "c", "d", "e", "a") )
**>
**> as.numeric(F2)
**> [1] 1 1 1 2 3 4 4
**>
**> Placing the "a" at the end changed the integer code of "b" in F2 to 1, which is
**> not a problem. But ......
**>
**> K = factor( levels(F2) )
**> as.numeric( K )
**> [1] 2 3 4 5 1
**> K
**> [1] b c d e a
**> Levels: a b c d e
**>
**> But the integer code of "b" in K is now 2, which does not correspond to its code
**> in F2.
**>
**> One would think that ordered=TRUE ought to avoid the corruption, but it does not
**> seem to accomplish that:
**>
**> K = factor( levels(F2), ordered=TRUE )
**> as.numeric(K)
**> [1] 2 3 4 5 1
**> K
**> [1] b c d e a
**> Levels: a < b < c < d < e
**>
**> But the integer code of "b" in K is still 2.
**>
**> However, corruption can be avoided with this idiom:
**>
**> K = factor( levels(F2), levels=levels(F2) )
**> as.numeric(K)
**> [1] 1 2 3 4 5
**> K
**> [1] "b" "c" "d" "e" "a"
**> Levels: b c d e a
**>
**> Now the integer code of "b" in K is 1, which, as desired, is in
**> correspondence with
**> its code in F2.
**>
**> ______________________________________________
**> R-help@stat.math.ethz.ch mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
*

-- Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA spencer.graves@pdf.com www.pdf.com <http://www.pdf.com> Tel: 408-938-4420 Fax: 408-280-7915 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmlReceived on Fri Jul 15 13:32:34 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:33:43 EST
*