[R] A faster way to aggregate?

From: Dieter Menne <dieter.menne_at_menne-biomed.de>
Date: Mon 04 Jul 2005 - 19:45:27 EST


Dear List,

I have a logical data frame with NA's and a grouping factor, and I want to calculate
the % TRUE per column and group. With an indexed database, result are mainly limited by printout time, but my R-solution below let's me wait (there are
about 10* cases in the real
data set).
Any suggestions to speed this up? Yes, I could wait for the result in real life, but just curious if I did something wrong. In real life, data set is ordered by groups, but how can I use this with a data frame?

Dieter Menne

# Generate test data

ncol = 20
nrow = 20000

ngroup=nrow %/% 20
colrow=ncol*nrow
group = factor(floor(runif(nrow)*ngroup))
sc = data.frame(group,matrix(ifelse(runif(colrow) > 0.1,runif(colrow)>0.3,NA),

     nrow=nrow))

# aggregate

system.time ({
 s = aggregate(sc[2:(ncol+1)],list(group = group),

    function(x) {

       xt=table(x)
       as.integer(100*xt[2]/(xt[1]+xt[2]))
    }
  )
})
# 26.09 0.03 26.95 NA NA

# by and apply

system.time ({
  s = by (sc[2:(ncol+1)],group,function(x) {

     apply(x,2,function(x) {
         xt=table(x)
         as.integer(100*xt[2]/(xt[1]+xt[2]))
       }
     )

    })
  s=do.call("rbind",s)
})

# 82.89 0.18 85.16 NA NA



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Mon Jul 04 19:48:06 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:11 EST