# Re: [R] R newbie: logical subsets

From: Joshua Tokle <jtokle_at_math.washington.edu>
Date: Sat 15 Jul 2006 - 07:13:46 EST

This is exactly what I needed -- thanks for your help Greg and Gabor.

I'm looking forward to replacing a dozen stored procedures, temp tables, and database calls with a one page R script.

Josh

On Wed, 12 Jul 2006, Greg Snow wrote:

> Gabor, your solution does not take into account the groups. How about
> something like:
>
> iris2 <- iris
> iris2\$m <- ave(iris2\$Sepal.Length, iris2\$Species)
> iris2\$s <- ave(iris2\$Sepal.Length, iris2\$Species, FUN=sd)
>
> iris2 <- transform(iris2, z= (Sepal.Length-m)/s)
>
> iris2.2 <- subset(iris2, abs(z) < 2)
>
> aggregate(iris2.2, list(iris2.2\$Species), FUN=mean)
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow@intermountainmail.org
> (801) 408-8111
>
>
> -----Original Message-----
> From: r-help-bounces@stat.math.ethz.ch
> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Gabor
> Grothendieck
> Sent: Tuesday, July 11, 2006 1:06 PM
> To: Joshua Tokle
> Cc: r-help@stat.math.ethz.ch
> Subject: Re: [R] R newbie: logical subsets
>
> Try this, using the built in anscombe data set:
>
> anscombe[!rowSums(abs(scale(anscombe)) > 2),]
>
>
>
> On 7/11/06, Joshua Tokle <jtokle@math.washington.edu> wrote:
>> Hello! I'm a newcomer to R hoping to replace some convoluted database
>
>> code with an R script. Unfortunately, I haven't been able to figure
>> out how to implement the following logic.
>>
>> Essentially, we have a database of transactions that are coded with a
>> geographic locale and a type. These are being loaded into a
>> data.frame with named variables city, type, and price. E.g.,
>> trans\$city and all that.
>>
>> We want to calculate mean prices by city and type, AFTER excluding
>> outliers. That is, we want to calculate the mean price in 3 steps:
>>
>> 1. calculate a mean and standard deviation by city and type over all
>> transactions 2. create a subset of the original data frame, excluding
>> transactions that differ from the relevant mean by more than 2
>> standard deviations 3. calculate a final mean by city and type based
>> on this subset.
>>
>> I'm stuck on step 2. I would like to do something like the following:
>>
>> fs <- list(factor(trans\$city), factor(trans\$type)) means <-
>> tapply(trans\$price, fs, mean) stdevs <- tapply(trans\$price, fs, sd)
>>
>> filter <- abs(trans\$price - means[trans\$city, trans\$type]) <
>> 2*stdevs[trans\$city, trans\$type]
>>
>> sub <- subset(trans, filter)
>>
>> The above code doesn't work. What's the correct way to do this?
>>
>> Thanks,
>> Josh
>>
>> ______________________________________________
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> http://www.R-project.org/posting-guide.html
>>
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help