AW: [R] Reading huge chunks of data from MySQL into Windows R

From: Dubravko Dolic <Dubravko.Dolic_at_komdat.com>
Date: Tue 07 Jun 2005 - 00:43:26 EST


In my (limited) experience R is more powerful concerning data manipulation. An example: I have a vector holding a user id. Some user ids can appear more than once. Doing SELECT COUNT(DISTINCT userid) on MySQL will take approx. 15 min. Doing length(unique(userid)) will take (almost) no time...

So I think the other way round will serve best: Do everything in R and avoid using SQL on the database...

-----Ursprüngliche Nachricht-----

Von: bogdan romocea [mailto:br44114@yahoo.com] Gesendet: Montag, 6. Juni 2005 16:27
An: Dubravko Dolic
Cc: r-help@stat.math.ethz.ch
Betreff: RE: [R] Reading huge chunks of data from MySQL into Windows R

You don't say what you want to do with the data, how many columns you have etc. However, I would suggest proceeding in this order:

1. Avoid R; do everything in MySQL.
2. Use random samples.
3. If for some reason you need to process all 160 million rows in R, do
it in a loop. Pull no more than, say, 50-100k rows at a time. This approach would allow you to process billions of rows without the memory and disk requirements going through the roof.

hth,
b.

-----Original Message-----

From: Dubravko Dolic [mailto:Dubravko.Dolic@komdat.com] Sent: Monday, June 06, 2005 9:31 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Reading huge chunks of data from MySQL into Windows R

Dear List,  

I'm trying to use R under Windows on a huge database in MySQL via ODBC (technical reasons for this...). Now I want to read tables with some 160.000.000 entries into R. I would be lucky if anyone out there has some good hints what to consider concerning memory management. I'm not sure about the best methods reading such huge files into R. for the moment I spilt the whole table into readable parts stick them together in R again.  

Any hints welcome.      

Dubravko Dolic

Statistical Analyst  

Email: dubravko.dolic@komdat.com

        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html                 

Discover Yahoo!
Have fun online with music videos, cool games, IM and more. Check it out! http://discover.yahoo.com/online.html

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Jun 07 01:22:25 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:23 EST