Re: [R] RJDBC vs RMySQL vs ???

From: James W. MacDonald <>
Date: Wed, 23 Jun 2010 16:36:29 -0400

Hi Ralf,

Ralf B wrote:
> I am running a simple SQL SELECT statement that involvs 50k + data
> points using R and the RJDBC interface. I am facing very slow response
> times in both the RGUI and the R console. When running this SQL
> statement directly in a SQL client I have processing times that are a
> lot lot faster (which means that the SQL statement itself is not the
> problem).
> Did any of you compare RJDBC vs RMySQL or is there a better, more
> efficient way to extract large data from databases using R? Would you
> recommend dumping data out completely into flat files and working with
> flat files instead? I expected that this would not be such a problem
> given that businesses maintain their data in DBs and R is supposed to
> be good in shifting around data. Am I doing something wrong?

Well, if you don't show people what you have done, how can anybody tell if you are doing something wrong or not?

I have no experience with RJDBC, so cannot say anything about that. However, I have always found RMySQL to be speedy enough. As an example:

 > library(RMySQL)
Loading required package: DBI
 > con <- dbConnect("MySQL", host="", user = "genome", dbname = "hg18")
 > system.time(a <- dbGetQuery(con, "select name, chromEnd from snp129 where chrom='chr1' and chromStart between 1 and 1e8;") + )

    user system elapsed
    7.95 0.06 38.59
 > dim(a)
[1] 508676 2

So 40 seconds to get half a million records. Since this is via the internet, I have to imagine things would be much faster querying a local DB.

But then you never say what constitutes 'slow' for you, so maybe this is slow as well?



> Ralf
> ______________________________________________
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

James W. MacDonald, M.S.
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 23 Jun 2010 - 20:39:56 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 23 Jun 2010 - 21:30:36 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive