Re: [R] Help on aggregate method

From: Joris Meys <jorismeys_at_gmail.com>
Date: Tue, 01 Jun 2010 17:27:37 +0200

Take a look at
?split (and unsplit)

eg:
Dur <- rnorm(100)
Attr1=rep(c("A","B"),each=50)
Attr2=rep(c("A","B"),times=50)

ap.dat <-data.frame(Attr1,Attr2,Dur)

split.fact <- paste(ap.dat$Attr1,ap.dat$Attr2) ap.list <-split(ap.dat,split.fact)
ap.mean <-lapply(ap.list,function(x){

        x$meanDur=rep(mean(x$Dur),dim(x)[1])
        return(x)

  })

ap.dat.fast <- unsplit(ap.mean,split.fact)

system.time on 1000 replicates gives :
> system.time(replicate(1000,{

+ split.fact <- paste(ap.dat$Attr1,ap.dat$Attr2)
+ ap.list <-split(ap.dat,split.fact)
+ ap.mean <-lapply(ap.list,functi .... [TRUNCATED]

   user system elapsed
   4.88 0.00 4.88
> source(.trPaths[5], echo=TRUE, max.deparse.length=150)

> system.time(replicate(1000,{

+ avgDur <- aggregate(ap.dat[["Dur"]], by = list(ap.dat[["Attr1"]],
+ ap.dat[["Attr2"]]), FUN="mean")
+ meanDur <- sapp .... [TRUNCATED]

   user system elapsed
  58.00 0.11 58.13
>

It should be a tenfold faster.

Cheers
Joris

On Tue, Jun 1, 2010 at 4:48 PM, Stella Pachidi <stella.pachidi_at_gmail.com>wrote:

> Dear R experts,
>
> I would really appreciate if you had an idea on how to use more
> efficiently the aggregate method:
>
> More specifically, I would like to calculate the mean of certain
> values on a data frame, grouped by various attributes, and then
> create a new column in the data frame that will have the corresponding
> mean for every row. I attach part of my code:
>
> matchMean <- function(ind,dataTable,aggrTable)
> {
> index <- which((aggrTable[,1]==dataTable[["Attr1"]][ind]) &
> (aggrTable[,2]==dataTable[["Attr2"]][ind]))
> as.numeric(aggrTable[index,3])
> }
>
> avgDur <- aggregate(ap.dat[["Dur"]], by = list(ap.dat[["Attr1"]],
> ap.dat[["Attr2"]]), FUN="mean")
> meanDur <- sapply((1:length(ap.dat[,1])), FUN=matchMean, ap.dat, avgDur)
> ap.dat <- cbind (ap.dat, meanDur)
>
> As I deal with very large dataset, it takes long time to run my
> matching function, so if you had an idea on how to automate more this
> matching process I would be really grateful.
>
> Thank you very much in advance!
>
> Kind regards,
> Stella
>
>
>
> --
> Stella Pachidi
> Master in Business Informatics student
> Utrecht University
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
Joris.Meys_at_Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue 01 Jun 2010 - 15:29:56 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 01 Jun 2010 - 15:30:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive