From: Roger D. Peng <rdpeng_at_gmail.com>

Date: Tue 18 Jul 2006 - 23:40:27 EST

*>
*

*>
*

> The answer depends on a lot of things, but most importantly

*> 1) What you are going to do with the data
*

*> 2) Whether you have a 32-bit or 64-bit version of R
*

*> 3) How much memory your computer has.
*

*>
*

*> In a 32-bit version of R (where R will not be allowed to address more than
*

*> 2-3Gb of memory) an object of size 350Mb is large enough to cause problems
*

*> (see eg the R Installation and Adminstration Guide).
*

*>
*

*> If your 350Mb data set has lots of variables and you only use a few at a
*

*> time then you may not have any trouble even on a 32-bit system once you
*

*> have read in the data.
*

*>
*

*> If you have a 64-bit version of R and a few Gb of memory then there should
*

*> be no real difficulty in working with that size of data set for most
*

*> analyses. You might come across some analyses (eg some cluster analysis
*

*> functions) that use n^2 memory for n observations and so break down.
*

*>
*

*>
*

*> -thomas
*

*>
*

*> Thomas Lumley Assoc. Professor, Biostatistics
*

*> tlumley@u.washington.edu University of Washington, Seattle
*

*>
*

*> ______________________________________________
*

*> R-help@stat.math.ethz.ch mailing list
*

*> https://stat.ethz.ch/mailman/listinfo/r-help
*

*> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
*

*> and provide commented, minimal, self-contained, reproducible code.
*

*>
*

Date: Tue 18 Jul 2006 - 23:40:27 EST

In my experience, the OS's use of virtual memory is only relevant in the rough sense that the OS can store *other* running applications in virtual memory so that R can use as much of the physical memory as possible. Once R itself overflows into virtual memory it quickly becomes unusable.

I'm not sure I understand your second question. As R is available in source code form, it can be compiled for many 64-bit operating systems.

-roger

Marshall Feldman wrote:

*> Hi,
**>
**> I have two further comments/questions about large datasets in R.
**>
**> 1. Does R's ability to handle large datasets depend on the operating
**> system's use of virtual memory? In theory, at least, VM should make the
**> difference between installed RAM and virtual memory on a hard drive
**> primarily a determinant of how fast R will calculate rather than whether or
**> not it can do the calculations. However, if R has some low-level routines
**> that have to be memory resident and use more memory as the amount of data
**> increases, this may not hold. Can someone shed light on this?
**>
**> 2. Is What 64-bit versions of R are available at present?
**>
**> Marsh Feldman
**> The University of Rhode Island
**>
**> -----Original Message-----
*

> From: Thomas Lumley [mailto:tlumley@u.washington.edu]

*> Sent: Monday, July 17, 2006 3:21 PM
**> To: Deepankar Basu
**> Cc: r-help@stat.math.ethz.ch
**> Subject: Re: [R] Large datasets in R
**>
**> On Mon, 17 Jul 2006, Deepankar Basu wrote:
**>
*

>> Hi! >> >> I am a student of economics and currently do most of my statistical work >> using STATA. For various reasons (not least of which is an aversion for >> proprietary software), I am thinking of shifting to R. At the current >> juncture my concern is the following: would I be able to work on >> relatively large data-sets using R? For instance, I am currently working >> on a data-set which is about 350MB in size. Would be possible to work >> data-sets of such sizes using R?

> The answer depends on a lot of things, but most importantly

-- Roger D. Peng | http://www.biostat.jhsph.edu/~rpeng/ ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Tue Jul 18 23:42:52 2006

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Wed 19 Jul 2006 - 00:21:11 EST.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*