Re: [R] FW: Large datasets in R

From: Roger D. Peng <rdpeng_at_gmail.com>
Date: Tue 18 Jul 2006 - 23:40:27 EST

In my experience, the OS's use of virtual memory is only relevant in the rough sense that the OS can store *other* running applications in virtual memory so that R can use as much of the physical memory as possible. Once R itself overflows into virtual memory it quickly becomes unusable.

I'm not sure I understand your second question. As R is available in source code form, it can be compiled for many 64-bit operating systems.

-roger

Marshall Feldman wrote:
> Hi,
>
> I have two further comments/questions about large datasets in R.
>
> 1. Does R's ability to handle large datasets depend on the operating
> system's use of virtual memory? In theory, at least, VM should make the
> difference between installed RAM and virtual memory on a hard drive
> primarily a determinant of how fast R will calculate rather than whether or
> not it can do the calculations. However, if R has some low-level routines
> that have to be memory resident and use more memory as the amount of data
> increases, this may not hold. Can someone shed light on this?

>
> 2. Is What 64-bit versions of R are available at present?

>
> Marsh Feldman
> The University of Rhode Island
>
> -----Original Message-----
> From: Thomas Lumley [mailto:tlumley@u.washington.edu]
> Sent: Monday, July 17, 2006 3:21 PM
> To: Deepankar Basu
> Cc: r-help@stat.math.ethz.ch
> Subject: Re: [R] Large datasets in R
>
> On Mon, 17 Jul 2006, Deepankar Basu wrote:
>

>> Hi!
>>
>> I am a student of economics and currently do most of my statistical work
>> using STATA. For various reasons (not least of which is an aversion for
>> proprietary software), I am thinking of shifting to R. At the current
>> juncture my concern is the following: would I be able to work on
>> relatively large data-sets using R? For instance, I am currently working
>> on a data-set which is about 350MB in size. Would be possible to work
>> data-sets of such sizes using R?

>
>
> The answer depends on a lot of things, but most importantly
> 1) What you are going to do with the data
> 2) Whether you have a 32-bit or 64-bit version of R
> 3) How much memory your computer has.
>
> In a 32-bit version of R (where R will not be allowed to address more than
> 2-3Gb of memory) an object of size 350Mb is large enough to cause problems
> (see eg the R Installation and Adminstration Guide).
>
> If your 350Mb data set has lots of variables and you only use a few at a
> time then you may not have any trouble even on a 32-bit system once you
> have read in the data.
>
> If you have a 64-bit version of R and a few Gb of memory then there should
> be no real difficulty in working with that size of data set for most
> analyses. You might come across some analyses (eg some cluster analysis
> functions) that use n^2 memory for n observations and so break down.
>
>
> -thomas
>
> Thomas Lumley Assoc. Professor, Biostatistics
> tlumley@u.washington.edu University of Washington, Seattle
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue Jul 18 23:42:52 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 19 Jul 2006 - 00:21:11 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.