Re: [R] FW: Large datasets in R

From: Roger D. Peng <>
Date: Tue 18 Jul 2006 - 23:40:27 EST

In my experience, the OS's use of virtual memory is only relevant in the rough sense that the OS can store *other* running applications in virtual memory so that R can use as much of the physical memory as possible. Once R itself overflows into virtual memory it quickly becomes unusable.

I'm not sure I understand your second question. As R is available in source code form, it can be compiled for many 64-bit operating systems.


Marshall Feldman wrote:
> Hi,
> I have two further comments/questions about large datasets in R.
> 1. Does R's ability to handle large datasets depend on the operating
> system's use of virtual memory? In theory, at least, VM should make the
> difference between installed RAM and virtual memory on a hard drive
> primarily a determinant of how fast R will calculate rather than whether or
> not it can do the calculations. However, if R has some low-level routines
> that have to be memory resident and use more memory as the amount of data
> increases, this may not hold. Can someone shed light on this?

> 2. Is What 64-bit versions of R are available at present?

> Marsh Feldman
> The University of Rhode Island
> -----Original Message-----
> From: Thomas Lumley []
> Sent: Monday, July 17, 2006 3:21 PM
> To: Deepankar Basu
> Cc:
> Subject: Re: [R] Large datasets in R
> On Mon, 17 Jul 2006, Deepankar Basu wrote:

>> Hi!
>> I am a student of economics and currently do most of my statistical work
>> using STATA. For various reasons (not least of which is an aversion for
>> proprietary software), I am thinking of shifting to R. At the current
>> juncture my concern is the following: would I be able to work on
>> relatively large data-sets using R? For instance, I am currently working
>> on a data-set which is about 350MB in size. Would be possible to work
>> data-sets of such sizes using R?

> The answer depends on a lot of things, but most importantly
> 1) What you are going to do with the data
> 2) Whether you have a 32-bit or 64-bit version of R
> 3) How much memory your computer has.
> In a 32-bit version of R (where R will not be allowed to address more than
> 2-3Gb of memory) an object of size 350Mb is large enough to cause problems
> (see eg the R Installation and Adminstration Guide).
> If your 350Mb data set has lots of variables and you only use a few at a
> time then you may not have any trouble even on a 32-bit system once you
> have read in the data.
> If you have a 64-bit version of R and a few Gb of memory then there should
> be no real difficulty in working with that size of data set for most
> analyses. You might come across some analyses (eg some cluster analysis
> functions) that use n^2 memory for n observations and so break down.
> -thomas
> Thomas Lumley Assoc. Professor, Biostatistics
> University of Washington, Seattle
> ______________________________________________
> mailing list
> PLEASE do read the posting guide!
> and provide commented, minimal, self-contained, reproducible code.
Roger D. Peng  |

______________________________________________ mailing list
PLEASE do read the posting guide!
and provide commented, minimal, self-contained, reproducible code.
Received on Tue Jul 18 23:42:52 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 19 Jul 2006 - 00:21:11 EST.

Mailing list information is available at Please read the posting guide before posting to the list.