From: Gabor Grothendieck <ggrothendieck_at_gmail.com>

Date: Wed 06 Apr 2005 - 12:15:19 EST

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Apr 06 12:22:18 2005

Date: Wed 06 Apr 2005 - 12:15:19 EST

On Apr 5, 2005 6:59 PM, Itay Furman <itayf@u.washington.edu> wrote:

*>
**> Hi,
**>
*

> I have a data set, the structure of which is something like this:

*>
**> > a <- rep(c("a", "b"), c(6,6))
**> > x <- rep(c("x", "y", "z"), c(4,4,4))
**> > df <- data.frame(a=a, x=x, r=rnorm(12))
**>
**> The true data set has >1 million rows. The factors "a" and "x"
**> have about 70 levels each; combined together they subset 'df'
**> into ~900 data frames.
**> For each such subset I'd like to compute various statistics
**> including quantiles, but I can't find an efficient way of
**> doing this. Aggregate() gives me the desired structure -
**> namely, one row per subset - but I can use it only to compute
**> a single quantile.
**>
**> > aggregate(df[,"r"], list(a=a, x=x), quantile, probs=0.25)
**> a x x
**> 1 a x 0.1693188
**> 2 a y 0.1566322
**> 3 b y -0.2677410
**> 4 b z -0.6505710
**>
**> With by() I could compute several quantiles per subset at
**> each shot, but the structure of the output is not
**> convenient for further analysis and visualization.
**>
**> > by(df[,"r"], list(a=a, x=x), quantile, probs=c(0, 0.25))
**> a: a
**> x: x
**> 0% 25%
**> -0.7727268 0.1693188
**> ----------------------------------------------------------
**> a: b
**> x: x
**> NULL
**> ----------------------------------------------------------
**>
**> [snip]
**>
**> I would like to end up with a data frame like this:
**>
**> a x 0% 25%
**> 1 a x -0.7727268 0.1693188
**> 2 a y -0.3410671 0.1566322
**> 3 b y -0.2914710 -0.2677410
**> 4 b z -0.8502875 -0.6505710
**>
**> I checked sweep() and apply() and didn't see how to harness
**> them for that purpose.
**>
**> So, is there a simple way to convert the object returned
**> by by() into a data.frame?
**> Or, is there a better way to go with this?
**> Finally, if I should roll my own coercion function: any tips?
**>
*

do.call("rbind", by(df, list(a = a, x = x), f))

where f is the appropriate function.

In this case f can be described in terms of df.quantile which is like quantile except it returns a one row data frame:

df.quantile <- function(x,p) as.data.frame(t(data.matrix(quantile(x, p)))) f <- function(df, p = c(0.25, 0.5)) cbind(df[1,1:2], df.quantile(df[,"r"], p)) ______________________________________________R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Apr 06 12:22:18 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:31:02 EST
*