Re: [R] MySql Versus R

From: Barry Rowlingson <b.rowlingson_at_lancaster.ac.uk>
Date: Fri, 01 Apr 2011 12:36:44 +0100

On Fri, Apr 1, 2011 at 11:46 AM, Henri Mone <henriMone_at_gmail.com> wrote:
> Dear R Users,
>
> I use for my data crunching a combination of MySQL and GNU R. I have
> to handle huge/ middle seized data which is stored in a MySql
> database, R executes a SQL command to fetch the data and does the
> plotting with the build in R plotting functions.
>
> The (low level) calculations like summing, dividing, grouping, sorting
> etc. can be done either with the sql command on the MySQL side or on
> the R side.
> My question is what is faster for this low level calculations / data
> rearrangement MySQL or R? Is there a general rule of thumb what to
> shift to the MySql side and what to the R side?

 Given that you are already set up to test this yourself, why don't you? SELECT everything from a table and add it in R, and then SELECT sum(everything) from a table and compare the time (obviously your example might be more complex). Post some benchmark test results together with your hardware spec. Probably best to the db-flavour R mailing list.

 Is the MySQl server running locally, ie on the same machine? Maybe PostgreSQL will be even faster? So many of these questions are problem-specific and hardware-setup related. You can get massive speedups by having more RAM, or more disk, or spreading your giant database onto multiple servers.

 Rules of thumb are rare in this world, since everyone's thumbs are different sizes and are being stuck into different sized problems.

Barry



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 01 Apr 2011 - 11:38:20 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 01 Apr 2011 - 11:40:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive