Re: [R] How to do aggregate operations with non-scalar functions

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Wed 06 Apr 2005 - 12:15:19 EST

On Apr 5, 2005 6:59 PM, Itay Furman <itayf@u.washington.edu> wrote:
>
> Hi,
>
> I have a data set, the structure of which is something like this:
>
> > a <- rep(c("a", "b"), c(6,6))
> > x <- rep(c("x", "y", "z"), c(4,4,4))
> > df <- data.frame(a=a, x=x, r=rnorm(12))
>
> The true data set has >1 million rows. The factors "a" and "x"
> have about 70 levels each; combined together they subset 'df'
> into ~900 data frames.
> For each such subset I'd like to compute various statistics
> including quantiles, but I can't find an efficient way of
> doing this. Aggregate() gives me the desired structure -
> namely, one row per subset - but I can use it only to compute
> a single quantile.
>
> > aggregate(df[,"r"], list(a=a, x=x), quantile, probs=0.25)
> a x x
> 1 a x 0.1693188
> 2 a y 0.1566322
> 3 b y -0.2677410
> 4 b z -0.6505710
>
> With by() I could compute several quantiles per subset at
> each shot, but the structure of the output is not
> convenient for further analysis and visualization.
>
> > by(df[,"r"], list(a=a, x=x), quantile, probs=c(0, 0.25))
> a: a
> x: x
> 0% 25%
> -0.7727268 0.1693188
> ----------------------------------------------------------
> a: b
> x: x
> NULL
> ----------------------------------------------------------
>
> [snip]
>
> I would like to end up with a data frame like this:
>
> a x 0% 25%
> 1 a x -0.7727268 0.1693188
> 2 a y -0.3410671 0.1566322
> 3 b y -0.2914710 -0.2677410
> 4 b z -0.8502875 -0.6505710

>
> I checked sweep() and apply() and didn't see how to harness
> them for that purpose.
>
> So, is there a simple way to convert the object returned
> by by() into a data.frame?
> Or, is there a better way to go with this?
> Finally, if I should roll my own coercion function: any tips?
>

One can use

        do.call("rbind", by(df, list(a = a, x = x), f))

where f is the appropriate function.

In this case f can be described in terms of df.quantile which is like quantile except it returns a one row data frame:

	df.quantile <- function(x,p) 
		as.data.frame(t(data.matrix(quantile(x, p))))

	f <- function(df, p = c(0.25, 0.5))
		cbind(df[1,1:2], df.quantile(df[,"r"], p))

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Apr 06 12:22:18 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:31:02 EST