Re: [R] Practical Data Limitations with R

From: Philipp Pagel <p.pagel_at_wzw.tum.de>
Date: Tue, 08 Apr 2008 17:20:11 +0200

On Tue, Apr 08, 2008 at 09:26:22AM -0500, Jeff Royce wrote:
> We are new to R and evaluating if we can use it for a project we need to
> do. We have read that R is not well suited to handle very large data
> sets. Assuming we have the data prepped and stored in an RDBMS (Oracle,
> Teradata, SQL Server), what can R reasonably handle from a volume
> perspective? Are there some guidelines on memory/machine sizing based
> on data volume? We need to be able to handle Millions of Rows from
> several sources.

As so often the answer is "it depends". R does not have an inherent maximum number of rows it can deal with - the available memory determines how big a dataset you can fit into RAM. So often the answer would be "yes - just buy more RAM".

A couple million rows are no problem at all if you don't have too many columns (done that). If you realy have a very large set of data which you cannot fit into memory, you may still be able to use R: Do you really need ALL data in memory at the same time? Often, very large datasets actually contain many different subsets of data which you want to analyze separately, anyway. The solution of storing the full data in an RDBMS and selecting the required subsets as needed is the best solution.

In your situation, I would simply load the full dataset into R and see what happens.

cu

        Philipp

-- 
Dr. Philipp Pagel                              Tel.  +49-8161-71 2131
Lehrstuhl für Genomorientierte Bioinformatik   Fax.  +49-8161-71 2186
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
 
 and
 
Institut für Bioinformatik und Systembiologie / MIPS
Helmholtz Zentrum München -
Deutsches Forschungszentrum für Gesundheit und Umwelt
Ingolstädter Landstrasse 1
85764 Neuherberg, Germany
http://mips.gsf.de/staff/pagel

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue 08 Apr 2008 - 15:25:52 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 08 Apr 2008 - 15:30:27 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive