[R] managing large datasets with RMySQL

From: Tamas K Papp <tpapp_at_princeton.edu>
Date: Wed 10 Aug 2005 - 02:13:52 EST


I have a large dataset (about 1 million data points from a 68-dimensional state space, result of an MCMC simulation) which won't fit in memory. I think that the only solution for analyzing this is saving it in relational database (when generated) and then reading back only portions of this data.

I have installed & initialized MySQL and the RMySQL package (I know nothing about SQL, unfortunately, but I will try to learn). The code from section 4.3.1 of the R Data Import/Export manual runs successfully.

Questions:

  1. should I use dbWriteTable(..., overwrite=FALSE, append=TRUE) for repeatedly saving the chunks of data?
  2. is it OK to make row.names=FALSE when writing?
  3. how do I retrieve only parts of the data? dbReadTable returns the whole thing if I understand correctly.

If somebody has written code for analyzing data in parts before, I would appreciate if he could send it.

Thanks,

Tamas



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Aug 10 02:18:52 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 15:11:35 EST