# [R] R newbie: logical subsets

From: Joshua Tokle <jtokle_at_math.washington.edu>
Date: Wed 12 Jul 2006 - 04:51:01 EST

Hello! I'm a newcomer to R hoping to replace some convoluted database code with an R script. Unfortunately, I haven't been able to figure out how to implement the following logic.

Essentially, we have a database of transactions that are coded with a geographic locale and a type. These are being loaded into a data.frame with named variables city, type, and price. E.g., trans\$city and all that.

We want to calculate mean prices by city and type, AFTER excluding outliers. That is, we want to calculate the mean price in 3 steps:

1. calculate a mean and standard deviation by city and type over all transactions
2. create a subset of the original data frame, excluding transactions that differ from the relevant mean by more than 2 standard deviations
3. calculate a final mean by city and type based on this subset.

I'm stuck on step 2. I would like to do something like the following:

fs <- list(factor(trans\$city), factor(trans\$type)) means <- tapply(trans\$price, fs, mean)
stdevs <- tapply(trans\$price, fs, sd)

filter <- abs(trans\$price - means[trans\$city, trans\$type]) <

2*stdevs[trans\$city, trans\$type]

sub <- subset(trans, filter)

The above code doesn't work. What's the correct way to do this?

Thanks,
Josh

R-help@stat.math.ethz.ch mailing list