# Re: [R] How to do aggregate operations with non-scalar functions

From: Rich FitzJohn <rich.fitzjohn_at_gmail.com>
Date: Wed 06 Apr 2005 - 10:07:30 EST

Hi Itay,

Not sure if by() can do it directly, but this does it from first principles, using lapply() and tapply() (which aggregate uses internally). It would be reasonably straightforward to wrap this up in a function.

```a <- rep(c("a", "b"), c(6,6))
x <- rep(c("x", "y", "z"), c(4,4,4))
df <- data.frame(a=a, x=x, r=rnorm(12))
```

## Probabilities for quantile
p <- c(.25, .5, .75)

tapply(df\$r, list(a=a, x=x), quantile, probs=y))

```## Then, we need to work out what combinations of a & x are possible:
## these are the header columns.  aggregate() does this in a much more
## complicated way, which may handle more difficult cases than this
## (e.g. if there are lots of missing values points, or something).
```
vars <- expand.grid(dimnames(y[[1]]))
```## Finish up by converting `y' into a true data.frame, and ommiting
## all the cases where all the values in `y' are NA: these are
## combinations of a and x that we did not encounter.
```
y <- as.data.frame(lapply(y, as.vector)) names(y) <- paste(p, "%", sep="")
i <- colSums(apply(y, 1, is.na)) != ncol(y) y <- cbind(vars, y)[i,]

Cheers,
Rich

On Apr 6, 2005 10:59 AM, Itay Furman <itayf@u.washington.edu> wrote:
>
> Hi,
>
> I have a data set, the structure of which is something like this:
>
> > a <- rep(c("a", "b"), c(6,6))
> > x <- rep(c("x", "y", "z"), c(4,4,4))
> > df <- data.frame(a=a, x=x, r=rnorm(12))
>
> The true data set has >1 million rows. The factors "a" and "x"
> have about 70 levels each; combined together they subset 'df'
> into ~900 data frames.
> For each such subset I'd like to compute various statistics
> including quantiles, but I can't find an efficient way of
> doing this. Aggregate() gives me the desired structure -
> namely, one row per subset - but I can use it only to compute
> a single quantile.
>
> > aggregate(df[,"r"], list(a=a, x=x), quantile, probs=0.25)
> a x x
> 1 a x 0.1693188
> 2 a y 0.1566322
> 3 b y -0.2677410
> 4 b z -0.6505710
>
> With by() I could compute several quantiles per subset at
> each shot, but the structure of the output is not
> convenient for further analysis and visualization.
>
> > by(df[,"r"], list(a=a, x=x), quantile, probs=c(0, 0.25))
> a: a
> x: x
> 0% 25%
> -0.7727268 0.1693188
> ----------------------------------------------------------
> a: b
> x: x
> NULL
> ----------------------------------------------------------
>
> [snip]
>
> I would like to end up with a data frame like this:
>
> a x 0% 25%
> 1 a x -0.7727268 0.1693188
> 2 a y -0.3410671 0.1566322
> 3 b y -0.2914710 -0.2677410
> 4 b z -0.8502875 -0.6505710
>
> I checked sweep() and apply() and didn't see how to harness
> them for that purpose.
>
> So, is there a simple way to convert the object returned
> by by() into a data.frame?
> Or, is there a better way to go with this?
> Finally, if I should roll my own coercion function: any tips?
>
> Thank you very much in advance,
> Itay
>
> ----------------------------------------------------------------
> itayf@u.washington.edu / +1 (206) 543 9040 / U of Washington
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
>

```--
Rich FitzJohn