Re: [R] memory limit

From: Henrik Bengtsson <hb_at_stat.berkeley.edu>
Date: Wed, 26 Nov 2008 16:07:33 -0800

On Wed, Nov 26, 2008 at 1:16 PM, Stavros Macrakis <macrakis_at_alum.mit.edu> wrote:
> I routinely compute with a 2,500,000-row dataset with 16 columns,
> which takes 410MB of storage; my Windows box has 4GB, which avoids
> thrashing. As long as I'm careful not to compute and save multiple
> copies of the entire data frame (because 32-bit Windows R is limited
> to about 1.5GB address space total, including any intermediate
> results), R works impressively well and fast with this dataset for
> selections, calculations, cross-tabs, plotting, etc. For example,
> simple single-column statistics and cross-tabs take << 1 sec., summary
> of the whole thing takes 16 sec. A linear regression between two
> numeric columns takes < 20 sec. Plotting of all 2.5M points takes a
> while, but that is no surprise (and is usually pointless [sic]
> anyway). I have not tried to do any compute-intensive statistical
> calculations on the whole data set.
>
> The main (but minor) annoyance with it is that it takes about 90 secs
> to load into memory using R's native binary "save" format, so I tend
> to keep the process lying around rather than re-starting and
> re-loading for each analysis. Fortunately, garbage collection is very
> effective in reclaiming unused storage as long as I'm careful to
> remove unnecessary objects.

FYI, objects saved with save(..., compress=FALSE) are notable faster to read back.

/Henrik

>
> -s
>
>
> On Wed, Nov 26, 2008 at 7:42 AM, iwalters <iwalters@cellc.co.za> wrote:
>>
>> I'm currently working with very large datasets that consist out of 1,000,000
>> + rows. Is it at all possible to use R for datasets this size or should I
>> rather consider C++/Java.
>>
>>
>> --
>> View this message in context: http://www.nabble.com/increasing-memory-limit-in-Windows-Server-2008-64-bit-tp20675880p20699700.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 27 Nov 2008 - 00:10:40 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 27 Nov 2008 - 01:30:28 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive