Re: [R] RJDBC vs RMySQL vs ???

From: James W. MacDonald <jmacdon_at_med.umich.edu>
Date: Wed, 23 Jun 2010 16:36:29 -0400

Hi Ralf,

Ralf B wrote:
> I am running a simple SQL SELECT statement that involvs 50k + data
> points using R and the RJDBC interface. I am facing very slow response
> times in both the RGUI and the R console. When running this SQL
> statement directly in a SQL client I have processing times that are a
> lot lot faster (which means that the SQL statement itself is not the
> problem).
>
> Did any of you compare RJDBC vs RMySQL or is there a better, more
> efficient way to extract large data from databases using R? Would you
> recommend dumping data out completely into flat files and working with
> flat files instead? I expected that this would not be such a problem
> given that businesses maintain their data in DBs and R is supposed to
> be good in shifting around data. Am I doing something wrong?

Well, if you don't show people what you have done, how can anybody tell if you are doing something wrong or not?

I have no experience with RJDBC, so cannot say anything about that. However, I have always found RMySQL to be speedy enough. As an example:

 > library(RMySQL)
Loading required package: DBI
 > con <- dbConnect("MySQL", host="genome-mysql.cse.ucsc.edu", user = "genome", dbname = "hg18")
 > system.time(a <- dbGetQuery(con, "select name, chromEnd from snp129 where chrom='chr1' and chromStart between 1 and 1e8;") + )

    user system elapsed
    7.95 0.06 38.59
 > dim(a)
[1] 508676 2

So 40 seconds to get half a million records. Since this is via the internet, I have to imagine things would be much faster querying a local DB.

But then you never say what constitutes 'slow' for you, so maybe this is slow as well?

Best,

Jim

>
> Ralf
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 23 Jun 2010 - 20:39:56 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 23 Jun 2010 - 21:30:36 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive