RE: [R] aggregation question

From: Liaw, Andy <andy_liaw_at_merck.com>
Date: Sat 16 Apr 2005 - 12:15:39 EST


> From: Christoph Lehmann
>
> great, Andy! Thanks a lot- I didn't know split.
> So 'split' can be used as alternative for 'aggregate', with
> the advantage
> that in the passed self-defined function one can consider
> more than one
> variable of the to-be-aggregated data.frame?

split() only split the data frame into a list of data frames, according to the variable supplied as the second argument. You can then use sapply()/lapply() to apply the same operation on each piece, where each piece contains all the variables.

Andy  

> Christoph
> > If I understood you correctly, here's one way:
> >
> > > sumWO2 <- sapply(split(dat, dat$id), function(d)
> sum(d$meas[d$date !=
> > 2]))
> > > sumWO2
> > a b c
> > 0.9439614 0.4481582 1.6967618
> >
> > Andy
> >
> >
> > > From: Christoph Lehmann
> > >
> > > Dear Sundar, dear Andy
> > > manyt thanks for the length(unique(x)) hint. It solves of
> course my
> > > problem in a very elegant way. Just of curiosity (or for
> > > potential future
> > > problems): how could I solve it in a way, conceptually
> > > different, namely,
> > > that the computation on 'meas' being dependent on the
> > > variable 'date'?,
> > > means the computation on a variable x in the function passed
> > > to aggregate
> > > is conditional on the value of another variable y? I hope you
> > > understand
> > > what I mean, let's think of an example:
> > >
> > > E.g for the example data.frame below, the sum shall be
> taken over the
> > > variable meas only for all entries with a corresponding
> 'data' != 2
> > >
> > > for this do I have to nest two aggregate statements, or is
> > > there a way
> > > using sapply or similar apply-based commands?
> > >
> > > thanks a lot for your kind help.
> > >
> > > Cheers!
> > >
> > > Christoph
> > >
> > > aggregate(data$meas, list(id = data$id), sum)
> > > >
> > > >
> > > > Christoph Lehmann wrote on 4/15/2005 9:51 AM:
> > > > > Hi I have a question concerning aggregation
> > > > >
> > > > > (simple demo code S. below)
> > > > >
> > > > > I have the data.frame
> > > > >
> > > > > id meas date
> > > > > 1 a 0.637513747 1
> > > > > 2 a 0.187710063 2
> > > > > 3 a 0.247098459 2
> > > > > 4 a 0.306447690 3
> > > > > 5 b 0.407573577 2
> > > > > 6 b 0.783255085 2
> > > > > 7 b 0.344265082 3
> > > > > 8 b 0.103893068 3
> > > > > 9 c 0.738649586 1
> > > > > 10 c 0.614154037 2
> > > > > 11 c 0.949924371 3
> > > > > 12 c 0.008187858 4
> > > > >
> > > > > When I want for each id the sum of its meas I do:
> > > > >
> > > > > aggregate(data$meas, list(id = data$id), sum)
> > > > >
> > > > > If I want to know the number of meas(ures) for each
> id I do, eg
> > > > >
> > > > > aggregate(data$meas, list(id = data$id), length)
> > > > >
> > > > > NOW: Is there a way to compute the number of meas(ures)
> > > for each id
> > > with
> > > > > not identical date (e.g using diff()?
> > > > > so that I get eg:
> > > > >
> > > > > id x
> > > > > 1 a 3
> > > > > 2 b 2
> > > > > 3 c 4
> > > > >
> > > > >
> > > > > I am sure it must be possible
> > > > >
> > > > > thanks for any (even short) hint
> > > > >
> > > > > cheers
> > > > > Christoph
> > > > >
> > > > >
> > > > >
> > > > > --------------
> > > > > data <- data.frame(c(rep("a", 4), rep("b", 4), rep("c", 4)),
> > > > > runif(12), c(1, 2, 2, 3, 2, 2, 3, 3,
> > > 1, 2, 3, 4))
> > > > > names(data) <- c("id", "meas", "date")
> > > > >
> > > > > m <- aggregate(data$meas, list(id = data$id), sum)
> > > > > names(m) <- c("id", "cum.meas")
> > > > >
> > > >
> > > >
> > > > How about:
> > > >
> > > > m <- aggregate(data["date"], data["id"],
> > > > function(x) length(unique(x)))
> > > >
> > > > --sundar
> > > >
> > >
> > > --
> > > +++ GMX - Die erste Adresse für Mail, Message, More +++
> > >
> > > 1 GB Mailbox bereits in GMX FreeMail http://www.gmx.net/de/go/mail
> > >
> > >
> > >
> >
> > ______________________________________________
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
>
> --
> +++ NEU: GMX DSL_Flatrate! Schon ab 14,99 EUR/Monat! +++
>
> GMX Garantie: Surfen ohne Tempo-Limit! http://www.gmx.net/de/go/dsl
>
>
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Apr 16 12:20:22 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:31:12 EST