Re: [R] Large datasets in R

From: Thomas Lumley <tlumley_at_u.washington.edu>
Date: Tue 18 Jul 2006 - 05:20:42 EST

On Mon, 17 Jul 2006, Deepankar Basu wrote:

> Hi!
>
> I am a student of economics and currently do most of my statistical work
> using STATA. For various reasons (not least of which is an aversion for
> proprietary software), I am thinking of shifting to R. At the current
> juncture my concern is the following: would I be able to work on
> relatively large data-sets using R? For instance, I am currently working
> on a data-set which is about 350MB in size. Would be possible to work
> data-sets of such sizes using R?

The answer depends on a lot of things, but most importantly

1) What you are going to do with the data
2) Whether you have a 32-bit or 64-bit version of R
3) How much memory your computer has.

In a 32-bit version of R (where R will not be allowed to address more than 2-3Gb of memory) an object of size 350Mb is large enough to cause problems (see eg the R Installation and Adminstration Guide).

If your 350Mb data set has lots of variables and you only use a few at a time then you may not have any trouble even on a 32-bit system once you have read in the data.

If you have a 64-bit version of R and a few Gb of memory then there should be no real difficulty in working with that size of data set for most analyses. You might come across some analyses (eg some cluster analysis functions) that use n^2 memory for n observations and so break down.

         -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley@u.washington.edu	University of Washington, Seattle

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Jul 18 06:38:26 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 18 Jul 2006 - 08:21:37 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.