Re: [R] memory limits in R loading a dataset and using the packagetree

From: Sicotte, Hugues Ph.D. <Sicotte.Hugues_at_mayo.edu>
Date: Fri 05 Jan 2007 - 21:48:28 GMT


I agree about sampling, but.. You can go a little further with your hardware.
The defaults in R is to play nice and limit your allocation to half the available RAM. Make sure you have a lot of disk swap space (at least 1G with 2G of RAM) and you can set your memory limit to 2G for R.

See help(memory.size) and use the memory.limit function

Hugues

P.s. Someone let me use their 16Gig of RAM linux And I was able to run R-64 bits with "top" showing 6Gigs of RAM allocated (with suitable --max-mem-size command line parameters at startup for R).

-----Original Message-----
From: r-help-bounces@stat.math.ethz.ch
[mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Weiwei Shi Sent: Friday, January 05, 2007 2:12 PM
To: domenico pestalozzi
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] memory limits in R loading a dataset and using the packagetree

IMHO, R is not good at really large-scale data mining, esp. when the algorithm is complicated. The alternatives are 1. sampling your data; sometimes you really do not need that large number of records and the accuracy might already be good enough when you load less.

2. find an alternative (commercial software) to do this job if you really need to load all.

3. make a wrapper function, sampling your data and load it into R and build model and repeat this process until you get n models. Then you can do like meta-learning or simply majority-win if your problem is classification.

HTH, On 1/4/07, domenico pestalozzi <statadat@gmail.com> wrote:
> I think the question is discussed in other thread, but I don't exactly
find
> what I want .
> I'm working in Windows XP with 2GB of memory and a Pentium 4 -
3.00Ghx.
> I have the necessity of working with large dataset, generally from
300,000
> records to 800,000 (according to the project), and about 300 variables
> (...but a dataset with 800,000 records could not be "large" in your
> opinion...). Because of we are deciding if R will be the official
software
> in our company, I'd like to say if the possibility of using R with
these
> datasets depends only by the characteristics of the "engine" (memory
and
> processor).
> In this case we can improve the machine (for example, what memory you
> reccomend?).
>
> For example, I have a dataset of 200,000 records and 211 variables but
I
> can't load the dataset because R doesn't work : I control the loading
> procedure (read.table in R) by using the windows task-manager and R is
> blocked when the file paging is 1.10 GB.
> After this I try with a sample of 100,000 records and I can correctly
load
> tha dataset, but I'd like to use the package tree, but after some
seconds (
> I use this tree(variable1~., myDataset) ) I obtain the message
"Reached
> total allocation of 1014Mb".
>
> I'd like your opinion and suggestion, considering that I could improve
(in
> memory) my computer.
>
> pestalozzi
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Sat Jan 06 14:51:50 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sat 06 Jan 2007 - 05:30:24 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.