# Re: [R] factor : how does it work ?

From: Florence Combes <fcombes_at_gmail.com>
Date: Fri 07 Oct 2005 - 00:50:12 EST

I tried with a df called "merged" and a column named "Pcc_0h_A" (which is numeric values):

> length(as.vector(merged\$Pcc_0h_A))

[1] 12202
>as.numeric(as.vector(merged\$Pcc_0h_A)[1:10])
[1] 12.276 11.958 14.098 13.843 12.451 11.745 NA NA NA NA
> ord<-ordered(merged\$Pcc_0h_A)
> length(ord)

[1] 12202
> ord[1:10]

[1] 12.276 11.958 14.098 13.843 12.451 11.745 <NA> <NA> <NA> <NA> 5386 Levels: 10.001 < 10.002 < 10.003 < 10.005 < 10.006 < 10.010 < ... < 9.999

> length(as.numeric(merged\$Pcc_0h_A))

[1] 12202
> as.numeric(merged\$Pcc_0h_A[1:10])

[1] 1812 1547 3308 3114 1960 1370 NA NA NA NA

are these the levels names converted into numbers ? I don't think because levels are like 10.001, 10.002 etc and 1812, 1547 etc are not in this form.

thanks a million

florence;

On 10/6/05, Duncan Murdoch <murdoch@stats.uwo.ca> wrote:
>
> On 10/6/2005 10:20 AM, Florence Combes wrote:
> >> > > 2d I can't manage to deal with factors, so when I have some, I
> >> transform
> >> > > them in vectors (with levels()), but I think I miss the power and
> >> utility
> >> > of
> >> > > the factor type ?
> >> >
> >> > levels() is not the conversion you want.
> >
> >
> > in fact I use
> > 'as.numeric(levels(f))[f]'
> > (from the ?factor description)
>
> That will only work if the levels have names that can be converted to
> numbers. In the example below, the levels are "a" and "b", so you'll
> get NA values if you try this.
> >
> > That lists all the levels, but
> >> > it doesn't tell you how they correspond to individual observations.
> For
> >> > example,
> >> >
> >> > > df <- data.frame(x=1:3, y=c('a','b','a'))
> >> > > df
> >> > x y
> >> > 1 1 a
> >> > 2 2 b
> >> > 3 3 a
> >> > > levels(df\$y)
> >> > [1] "a" "b"
> >> >
> >> > If you need to convert back to character values, use as.character():
> >> >
> >> > > as.character(df\$y)
> >> > [1] "a" "b" "a"
> >
> >
> > got it.
> >
> >
> >> > 1. You can't compare the levels of a factor unless you declared it to
> >> > be ordered:
> >> >
> >> > > df\$y[1] > df\$y[2]
> >> > [1] NA
> >> > Warning message:
> >> > > not meaningful for factors in: Ops.factor(df\$y[1], df\$y[2])
> >> >
> >> > but
> >> >
> >> > > df\$y <- ordered(df\$y)
> >> > > df\$y[1] > df\$y[2]
> >> > [1] FALSE
> >> >
> >> > However, you need to watch out here: the comparison is done by the
> order
> >> > of the factors
> >
> >
> > I am sorry I don't understand this.
> > here you compare the position of a in the factor and the position of b
> in
> > the factor ?
>
> It's the position of "a" in the levels() vector that is being compared.
> I declared that the factor had ordered levels, and R interprets that
> to mean that the first level is less than the second level, etc. This
> is useful if you want to use meaningful names for ordered categories.
> Comparison will be by the order of the categories, not by the name you
> chose.
>
> Duncan Murdoch
>
> >
> > , not an alphabetic comparison of their names:
> >> >
> >> > > levels(df\$y) <- c("before", "after")
> >> > > df
> >> > x y
> >> > 1 1 before
> >> > 2 2 after
> >> > 3 3 before
> >> > > df\$y[1] > df\$y[2]
> >> > [1] FALSE
> >
> >
> > best regards,
> >
> > florence.
> >
>
>

[[alternative HTML version deleted]]

R-help@stat.math.ethz.ch mailing list