Re: [R] A faster way to aggregate?

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Tue 05 Jul 2005 - 01:22:02 EST

On 7/4/05, Dieter Menne <dieter.menne@menne-biomed.de> wrote:
> My Original question (edited)
>
> > I have a logical data frame with NA's and a grouping factor, and I want to
> > calculate
> > the % TRUE per column and group. With an indexed database, result are mainly
> > limited by printout time, but my R-solution below lets me wait.
> > Any suggestions to speed this up?
>
> Gabor Grothendieck <ggrothendieck <at> gmail.com> writes:
>
> > Look at ?rowsum
>
> Nearby colMeans works, but why so slow?
>
> Dieter Menne
>
> # Generate test data
> ncol = 20
> nrow = 20000
> ngroup=nrow %/% 20
> colrow=ncol*nrow
> group = factor(floor(runif(nrow)*ngroup))
> sc = data.frame(group,matrix(ifelse(runif(colrow) > 0.1,runif(colrow)>0.3,NA),
> nrow=nrow))
>
> # aggregate (still best)
> system.time ({
> s = aggregate(sc[2:(ncol+1)],list(group = group),
> function(x) {
> xt=table(x)
> as.integer(100*xt[2]/(xt[1]+xt[2]))
> }
> )
> })
> # 26.09 0.03 26.95 NA NA
>
> # by and apply
> system.time ({
> s1 = by (sc[2:(ncol+1)],group,function(x) {
> as.integer(100*colMeans(x,na.rm=T))
>
> })
> s1=as.data.frame(do.call("rbind",s))
> })
>
> # 51.49 0.93 52.60 NA NA
>

Note that you did not actually try my suggestion which was rowsum, not colMeans.

The following solution based on rowsum is more than an order of magnitude faster than any of the solutions in your posts:

	sc1 <- as.matrix(sc[,-1])
	is.na.sc1 <- is.na(sc1)
	x1 <- rowsum(ifelse(is.na.sc1, 0, sc1), group)
	xx <- rowsum(1-is.na.sc1, group)
	res <- floor(100*x1/xx)

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Jul 05 01:25:59 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:11 EST