Re: [R] Is this an artifact of using "which"?

From: Peter Dalgaard <P.Dalgaard_at_biostat.ku.dk>
Date: Mon, 14 Apr 2008 15:03:15 +0200

Richard.Cotton_at_hsl.gov.uk wrote:
>> I used "which" to obtain a subset of values from my data.frame.
>> however, I find that there is a "trace" of the values I have removed.
>> Any suggestions would be greatly appreciate.
>>
>> Below is my data:
>>
>> d <- data.frame( val = 1:10,
>> group = sample(LETTERS[1:5], 10, repl=TRUE) )
>>
>> >d
>> val group
>> 1 1 B
>> 2 2 E
>> 3 3 B
>> 4 4 C
>> 5 5 A
>> 6 6 B
>> 7 7 A
>> 8 8 E
>> 9 9 E
>> 10 10 A
>>
>> ## selecting everything that is not group "A"
>> d<-d[which(d$group !="A"),]
>>
>> > d
>> val group
>> 1 1 B
>> 2 2 E
>> 3 3 B
>> 4 4 C
>> 6 6 B
>> 8 8 E
>> 9 9 E
>>
>> > levels(d$group)
>> [1] "A" "B" "C" "E"
>>

>
> The (imho) unintuitive behaviour is to do with the subsetting function 
> [.factor, not which.  There are a couple of workarounds:
>   

In that case, your intuition needs readjustment....

There are other systems which (de facto) drop unused levels by default, and it is a real pain to work around, especially for subgroup analyses. E.g. there is no way to get PROC FREQ in SAS to include a count of zero, and barplots of ratings fro 0 to 10 lose columns "randomly" in SPSS (this _can_ be worked around, though).

Anyways, it is illogical: There's no reason that a tabulation of gender distribution for (say) tenured CS professors should suddenly pretend that the female gender does not exist!

> 1. Call factor to recreate the levels, and get rid of "A"
> factor(d$group)
>
> 2. Redefine [.factor; see dropUnusedLevels in the Hmisc package.
>
> Regards,
> Richie.
>
> Mathematical Sciences Unit
> HSL
>
>
> ------------------------------------------------------------------------
> ATTENTION:
>
> This message contains privileged and confidential info...{{dropped:20}}

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 14 Apr 2008 - 13:11:43 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 14 Apr 2008 - 13:30:28 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive