Re: [R] factor : how does it work ?

From: Duncan Murdoch <murdoch_at_stats.uwo.ca>
Date: Fri 07 Oct 2005 - 00:32:37 EST

On 10/6/2005 10:20 AM, Florence Combes wrote:

>> > > 2d I can't manage to deal with factors, so when I have some, I
>> transform
>> > > them in vectors (with levels()), but I think I miss the power and
>> utility
>> > of
>> > > the factor type ?
>> >
>> > levels() is not the conversion you want.

>
>
> in fact I use
> 'as.numeric(levels(f))[f]'
> (from the ?factor description)

That will only work if the levels have names that can be converted to numbers. In the example below, the levels are "a" and "b", so you'll get NA values if you try this.
>
> That lists all the levels, but

>> > it doesn't tell you how they correspond to individual observations. For
>> > example,
>> >
>> > > df <- data.frame(x=1:3, y=c('a','b','a'))
>> > > df
>> > x y
>> > 1 1 a
>> > 2 2 b
>> > 3 3 a
>> > > levels(df$y)
>> > [1] "a" "b"
>> >
>> > If you need to convert back to character values, use as.character():
>> >
>> > > as.character(df$y)
>> > [1] "a" "b" "a"

>
>
> got it.
>
>
>> > 1. You can't compare the levels of a factor unless you declared it to
>> > be ordered:
>> >
>> > > df$y[1] > df$y[2]
>> > [1] NA
>> > Warning message:
>> > > not meaningful for factors in: Ops.factor(df$y[1], df$y[2])
>> >
>> > but
>> >
>> > > df$y <- ordered(df$y)
>> > > df$y[1] > df$y[2]
>> > [1] FALSE
>> >
>> > However, you need to watch out here: the comparison is done by the order
>> > of the factors

>
>
> I am sorry I don't understand this.
> here you compare the position of a in the factor and the position of b in
> the factor ?

It's the position of "a" in the levels() vector that is being compared.   I declared that the factor had ordered levels, and R interprets that to mean that the first level is less than the second level, etc. This is useful if you want to use meaningful names for ordered categories. Comparison will be by the order of the categories, not by the name you chose.

Duncan Murdoch

>
> , not an alphabetic comparison of their names:

>> >
>> > > levels(df$y) <- c("before", "after")
>> > > df
>> > x y
>> > 1 1 before
>> > 2 2 after
>> > 3 3 before
>> > > df$y[1] > df$y[2]
>> > [1] FALSE

>
>
> best regards,
>
> florence.
>

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Oct 07 00:43:39 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:40:38 EST