Re: [R] Bug in levels() function?

From: Peter Dalgaard <P.Dalgaard_at_biostat.ku.dk>
Date: Mon, 28 Jan 2008 11:18:57 +0100

Groot, Philip de wrote:
> Hello all,
>
> I am not sure whether it actually is a bug, but it is not the behaviour I would expect. Please consider this:
>
>
>> Sibships
>>
> [1] Patient_2400 Patient_2400 Patient_345 Patient_345 Patient_8901
> [6] Patient_8901 Patient_4008 Patient_4008 Patient_7991 Patient_7991
> [11] Patient_8353 Patient_8353 Patient_1212 Patient_1212 Patient_2168
> [16] Patient_2168 Patient_2760 Patient_2760 Patient_4726 Patient_4726
> [21] Patient_6699 Patient_6699 Patient_7641 Patient_7641 Patient_8263
> [26] Patient_8263 Patient_1389 Patient_1389 Patient_1618 Patient_1618
> [31] Patient_2410 Patient_2410 Patient_2612 Patient_2612 Patient_2721
> [36] Patient_2721 Patient_5053 Patient_5053 Patient_8458 Patient_8458
> [41] Patient_211 Patient_211 Patient_9004 Patient_9004 Patient_3423
> [46] Patient_3423 Patient_7413 Patient_7413 Patient_7815 Patient_7815
> [51] Patient_9232 Patient_9232 Patient_2267 Patient_2267 Patient_468
> [56] Patient_468
> 28 Levels: Patient_1212 Patient_1389 Patient_1618 Patient_211 ... Patient_9232
>
>
>> Comparison_Indices
>>
> [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
> [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
> [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>
>
>> Sibships[Comparison_Indices]
>>
> [1] Patient_2400 Patient_2400 Patient_345 Patient_345 Patient_8901
> [6] Patient_8901 Patient_7413 Patient_7413
> 28 Levels: Patient_1212 Patient_1389 Patient_1618 Patient_211 ... Patient_9232
>
> The problem with this last command is that I would expect 4 levels (because only 8 "Comparison_Indices" are true, which is equal to 4 sibships. So: levels() does not take array indices into account or stated otherwise: if you use a subset in an array (vector), the levels() are not properly updated (to my opinion).
>
> What I additionally found is the following:
>
>> small_test <- factor(x=c("a", "b", "c"))
>> typeof(small_test)
>>
> [1] "integer"
>
> The same happens to the Sibships that I defined as a factor? Why is it of type integer?
>
> This is the version() output:
>
>> version
>>
> _
> platform x86_64-unknown-linux-gnu
> arch x86_64
> os linux-gnu
> system x86_64, linux-gnu
> status
> major 2
> minor 6.1
> year 2007
> month 11
> day 26
> svn rev 43537
> language R
> version.string R version 2.6.1 (2007-11-26)
>
>
> So: should I submit a Bug report?
>
>
No. This is all completely as designed. Factors are internally integers (group codes), with a levels attribute that says what the codes mean. If you want the full story, use dput(small_test) or class(small_test) or str(small_test).

And subsetting a factor retains the original factor levels. To drop unused levels, just use factor(f[index]) or f[index, drop=TRUE]. The opposite behaviour can be even more annoying/dangerous because it leads to empty cells dropping out of tables and bars disappearing from barplots.

-- 
   O__  ---- Peter Dalgaard             Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 28 Jan 2008 - 10:21:09 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 28 Jan 2008 - 14:00:08 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive