Re: [R] A faster way to aggregate?

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Mon 04 Jul 2005 - 21:12:58 EST

On 7/4/05, Dieter Menne <dieter.menne@menne-biomed.de> wrote:
> Dear List,
>
> I have a logical data frame with NA's and a grouping factor, and I want to
> calculate
> the % TRUE per column and group. With an indexed database, result are mainly
> limited by printout time, but my R-solution below let's me wait (there are
> about 10* cases in the real
> data set).
> Any suggestions to speed this up? Yes, I could wait for the result in real
> life, but just curious if I did something wrong. In real life, data set is
> ordered by groups, but how can I use this with a data frame?
>
> Dieter Menne
>
>
> # Generate test data
> ncol = 20
> nrow = 20000
> ngroup=nrow %/% 20
> colrow=ncol*nrow
> group = factor(floor(runif(nrow)*ngroup))
> sc = data.frame(group,matrix(ifelse(runif(colrow) >
> 0.1,runif(colrow)>0.3,NA),
> nrow=nrow))
>
> # aggregate
> system.time ({
> s = aggregate(sc[2:(ncol+1)],list(group = group),
> function(x) {
> xt=table(x)
> as.integer(100*xt[2]/(xt[1]+xt[2]))
> }
> )
> })
> # 26.09 0.03 26.95 NA NA
>
> # by and apply
> system.time ({
> s = by (sc[2:(ncol+1)],group,function(x) {
> apply(x,2,function(x) {
> xt=table(x)
> as.integer(100*xt[2]/(xt[1]+xt[2]))
> }
> )
> })
> s=do.call("rbind",s)
> })
>
> # 82.89 0.18 85.16 NA NA
>

Look at ?rowsum



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Mon Jul 04 21:15:35 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:11 EST