Re: [R] Re : Large database help

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Wed 17 May 2006 - 02:19:07 EST

On Tue, 16 May 2006, Robert Citek wrote:

>
> On May 16, 2006, at 8:15 AM, justin bem wrote:
>
>> Try to open your db with MySQL and use RMySQL
>
> I've seen this offered up as a suggestion a few times but with little
> detail. In my experience, even using SQL to pull in data from a
> MySQL DB, R would need to load the entire data set into RAM before
> doing some calculations. But perhaps I'm using RMySQL incorrectly[1].
>
> As a toy problem, let's imagine a data set (foo) with a single
> numerical field (bar) and 1 billion records (1e9). In MySQL one
> would do the following to calculate the mean:
>
> select avg(bar) from foo ;
>
> For a smaller data set I would issue a select statement and then
> fetch the entire set into a data frame before calculating the mean.
> Given such a large data set, how would one calculate the mean using R
> connected to this MySQL database? How would one calculate the median
> using R connected to this MySQL database?
>
> Pointers to references appreciated.

Well, there *is* a manual about R Data Import/Export, and this does discuss using R with DBMSs with examples. How about reading it?

The point being made is that you can import just the columns you need, and indeed summaries of those columns.

> [1] http://www.sourcekeg.co.uk/cran/src/contrib/Descriptions/RMySQL.html

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Wed May 17 02:22:41 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 18 May 2006 - 10:10:12 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.