Re: [R] Comparison of aggregate in R and group by in mysql

From: zhihuali <lzhtom_at_hotmail.com>
Date: Sun, 27 Jan 2008 01:14:46 +0000

I repeated your experiment:
> n <- 1000000
> x <- data.frame(A=sample(LETTERS,n,TRUE), B=sample(letters[1:4],n,TRUE),C=sample(LETTERS[1:4], n, TRUE), data=runif(n))
> system.time(x.agg <- aggregate(x$data, list(x$A, x$B, x$C), mean))

   user system elapsed
  1.824 0.212 2.038

Now I use my own data:
> length(levels(group))

[1] 7049
> length(levels(type))

[1] 11704
> y<-data.frame(group,type,signal)
> system.time(y.agg <- aggregate(y$signal, list(y$group,y$type), mean))

   (I killed it after 30 minutes)

> Date: Sat, 26 Jan 2008 19:55:51 -0500
> From: jholtman_at_gmail.com
> To: lzhtom_at_hotmail.com
> Subject: Re: [R] Comparison of aggregate in R and group by in mysql
> CC: r-help_at_stat.math.ethz.ch
>
> How large is your dataframe? How much memory do you have on your
> system? Are you paging? Here is a test I ran with a data frame with
> 1,000,000 entries and it seems to be fast:
>
> > n <- 1000000
> > x <- data.frame(A=sample(LETTERS,n,TRUE), B=sample(letters[1:4],n,TRUE),
> + C=sample(LETTERS[1:4], n, TRUE), data=runif(n))
> > system.time(x.agg <- aggregate(x$data, list(x$A, x$B, x$C), mean))
> user system elapsed
> 2.65 0.34 3.00
> >
>
> On Jan 26, 2008 6:45 PM, zhihuali <lzhtom_at_hotmail.com> wrote:
> >
> > Hi, netters,
> >
> > First of all, thanks a lot for all the prompt replies to my earlier question about "merging" data frames in R.
> > Actually that's an equivalence to the "join" clause in mysql.
> >
> > Now I have another question. Suppose I have a data frame X with lots of columns/variables:
> > Name, Age,Group, Type, Salary.
> > I wanna do a subtotal of salaries:
> > aggregate(X$Salary, by=list(X$Group,X$Age,X$Type),Fun=mean)
> >
> > When the levels of Group and Type are huge, it took R forever to finish the aggregation.
> > And I used gc to find that the memory usage was big too.
> >
> > However, in mysql, it took seconds to finish a similar job:
> > select Group,Age,Type ,avg(Salary) from X group by Group,Age,Type
> >
> > Is it because mysql is superior in doing such kind of things? Or my R command is not efficient enough? Why did R have to consume huge memories to do the aggregation?
> >
> > Thanks again!
> >
> > Zhihua Li
> >
> > _________________________________________________________________
> > 天凉了,添衣了,心动了,"七件"了
> > http://get.live.cn
> > [[alternative HTML version deleted]]
> >
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?



MSN圣诞礼物火热登场,免费发放中,快来领取吧! http://im.live.cn/emoticons/?ID=18

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 27 Jan 2008 - 02:07:40 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 27 Jan 2008 - 02:30:09 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive