Re: [R] Comparison of aggregate in R and group by in mysql

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Sat, 26 Jan 2008 19:07:08 -0500

How does the it compare if you read it into R and then do your aggregate with sqldf:

library(sqldf)

# example using builtin data set CO2
CO2agg <- sqldf("select Plant, Type, Treatment, avg(conc) from CO2 group by Plant, Type, Treatment")

# or using your data:

Xagg <- sqldf("select Group, Age, Type, avg(Salary) from X group by Group, Age, Type")

On Jan 26, 2008 6:45 PM, zhihuali <lzhtom_at_hotmail.com> wrote:
>
> Hi, netters,
>
> First of all, thanks a lot for all the prompt replies to my earlier question about "merging" data frames in R.
> Actually that's an equivalence to the "join" clause in mysql.
>
> Now I have another question. Suppose I have a data frame X with lots of columns/variables:
> Name, Age,Group, Type, Salary.
> I wanna do a subtotal of salaries:
> aggregate(X$Salary, by=list(X$Group,X$Age,X$Type),Fun=mean)
>
> When the levels of Group and Type are huge, it took R forever to finish the aggregation.
> And I used gc to find that the memory usage was big too.
>
> However, in mysql, it took seconds to finish a similar job:
> select Group,Age,Type ,avg(Salary) from X group by Group,Age,Type
>
> Is it because mysql is superior in doing such kind of things? Or my R command is not efficient enough? Why did R have to consume huge memories to do the aggregation?
>
> Thanks again!
>
> Zhihua Li
>
> _________________________________________________________________
> 天凉了,添衣了,心动了,"七件"了
> http://get.live.cn
> [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 27 Jan 2008 - 00:12:13 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 27 Jan 2008 - 00:30:09 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive