[R] integer codes of factors

From: Mike R <mike.rstat_at_gmail.com>
Date: Fri 15 Jul 2005 - 07:04:55 EST


  U = c("b", "b", "b", "c", "d", "e", "e")

  F1 = factor( U, levels=c("a", "b", "c", "d", "e") )

  as.numeric(F1)
[1] 2 2 2 3 4 5 5

Here, the integer code of "b" in F1 is 2

  K = factor( levels(F1) )
  as.numeric(K)
[1] 1 2 3 4 5
  K
[1] a b c d e
  Levels: a b c d e

And again, the integer code of "b" in K is 2. Great!

I am wondering how modify that usage such that the correspondence between the two numeric vectors can this be trusted. for example, the correspondence can be corrupted by placing the "a" at the end:

  F2 = factor( U, levels=c("b", "c", "d", "e", "a") )  

  as.numeric(F2)
[1] 1 1 1 2 3 4 4

Placing the "a" at the end changed the integer code of "b" in F2 to 1, which is not a problem. But ......

  K = factor( levels(F2) )
  as.numeric( K )
[1] 2 3 4 5 1

  K
[1] b c d e a
  Levels: a b c d e

But the integer code of "b" in K is now 2, which does not correspond to its code in F2.

One would think that ordered=TRUE ought to avoid the corruption, but it does not seem to accomplish that:

  K = factor( levels(F2), ordered=TRUE )   as.numeric(K)
[1] 2 3 4 5 1

  K
[1] b c d e a
  Levels: a < b < c < d < e

But the integer code of "b" in K is still 2.

However, corruption can be avoided with this idiom:

  K = factor( levels(F2), levels=levels(F2) )   as.numeric(K)
[1] 1 2 3 4 5
  K
[1] "b" "c" "d" "e" "a"
  Levels: b c d e a

Now the integer code of "b" in K is 1, which, as desired, is in correspondence with
its code in F2.



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Jul 15 07:14:30 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:43 EST