Re: [R] memory limit

From: Stavros Macrakis <macrakis_at_alum.mit.edu>
Date: Wed, 26 Nov 2008 16:16:41 -0500

I routinely compute with a 2,500,000-row dataset with 16 columns, which takes 410MB of storage; my Windows box has 4GB, which avoids thrashing. As long as I'm careful not to compute and save multiple copies of the entire data frame (because 32-bit Windows R is limited to about 1.5GB address space total, including any intermediate results), R works impressively well and fast with this dataset for selections, calculations, cross-tabs, plotting, etc. For example, simple single-column statistics and cross-tabs take << 1 sec., summary of the whole thing takes 16 sec. A linear regression between two numeric columns takes < 20 sec. Plotting of all 2.5M points takes a while, but that is no surprise (and is usually pointless [sic] anyway). I have not tried to do any compute-intensive statistical calculations on the whole data set.

The main (but minor) annoyance with it is that it takes about 90 secs to load into memory using R's native binary "save" format, so I tend to keep the process lying around rather than re-starting and re-loading for each analysis. Fortunately, garbage collection is very effective in reclaiming unused storage as long as I'm careful to remove unnecessary objects.

            -s

On Wed, Nov 26, 2008 at 7:42 AM, iwalters <iwalters_at_cellc.co.za> wrote:
>
> I'm currently working with very large datasets that consist out of 1,000,000
> + rows. Is it at all possible to use R for datasets this size or should I
> rather consider C++/Java.
>
>
> --
> View this message in context: http://www.nabble.com/increasing-memory-limit-in-Windows-Server-2008-64-bit-tp20675880p20699700.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 26 Nov 2008 - 21:20:01 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 27 Nov 2008 - 00:30:29 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive