Date: Thu 07 Apr 2005 - 16:33:32 EST

On Apr 7, 2005 1:18 AM, Itay Furman <itayf@u.washington.edu> wrote:

> On Tue, 5 Apr 2005, Gabor Grothendieck wrote:

**> > On Apr 5, 2005 6:59 PM, Itay Furman <itayf@u.washington.edu> wrote:
**> >> Hi,
**> >> I have a data set, the structure of which is something like this:
**> >>> a <- rep(c("a", "b"), c(6,6))
**> >>> x <- rep(c("x", "y", "z"), c(4,4,4))
**> >>> df <- data.frame(a=a, x=x, r=rnorm(12))
**> >> The true data set has >1 million rows. The factors "a" and "x"
**> >> have about 70 levels each; combined together they subset 'df'
**> >> into ~900 data frames.
**> >> For each such subset I'd like to compute various statistics
**> >> including quantiles, but I can't find an efficient way of
**> >> I would like to end up with a data frame like this:
**> >>
**> >> a x 0% 25%
**> >> 1 a x -0.7727268 0.1693188
**> >> 2 a y -0.3410671 0.1566322
**> >> 3 b y -0.2914710 -0.2677410
**> >> 4 b z -0.8502875 -0.6505710
**> > One can use
**> >
**> > do.call("rbind", by(df, list(a = a, x = x), f))
**> >
**> > where f is the appropriate function.
**> > In this case f can be described in terms of df.quantile which
**> > is like quantile except it returns a one row data frame:
**> >
**> > df.quantile <- function(x,p)
**> > as.data.frame(t(data.matrix(quantile(x, p))))
**> >
**> > f <- function(df, p = c(0.25, 0.5))
**> > cbind(df[1,1:2], df.quantile(df[,"r"], p))
**> >
**> Thanks! Just what I wanted.
**> A minor point is that for some reason the row numbers in the
**> final data frame are not sequential (see below -- this is not a
**> consequence of my changes).
These are the original row numbers of the first row of each combo of a and x. If z is the result of do.call you can always do this: row.names(z) <- 1:nrow(z) if this its needed.

