Re: [R] Res: gc() and memory efficiency

From: Henrik Bengtsson <hb_at_stat.berkeley.edu>
Date: Wed, 6 Feb 2008 19:25:36 -0800

Open suggestion/question:

If you in each step of an K-step iteration load/allocate a large object, each time of a different size, followed by smaller memory allocations (due to your analysis), you might be better of if you could do the iteration such that the largest object is in the first iteration, the 2nd largest in the 2nd, and so on.

Example: If done in the incorrect order, you might end up with fragmented memory allocations as follows:

Suboptimal:

1. Allocate: [40% object][10% misc][50% free] (blocks of memory image)
2. Free 'object': [40% free][10% misc][50% free]
3. Allocate '50% object': [40% free][10% misc][50% object]
4. Free 'object': [40% free][10% misc][50% free]
5. Allocate '60% object': Failure to allocate that amount of memory!

Optimal:

1. Allocate: [60% object][10% misc][30% free]
2. Free 'object': [60% free][10% misc][30% free]
3. Allocate '50% object': [50% object][10% free][10% misc][30% free]
4. Free 'object': [60% free][10% misc][30% free]
5. Allocate '40% object': [40% object][20% free][10% misc][30% free]
6. Free 'object': [60% free][10% misc][30% free]

/Henrik

On Feb 6, 2008 5:35 PM, Milton Cezar Ribeiro <milton_ruser_at_yahoo.com.br> wrote:
> Dear Harold,
>
> I had the same problem some times ago. I noticed that after I run a set commands (cleaning all non-usefull variables) for 5 times, the system broken-down. I solved it building several scritpsNN.R and call them in a .BAT DOS file. It worked so fine, almost in my case, and the computer runned for several days without stop :-)
>
> If you need more info on this solutions, fill free to write me again.
>
> By other side, if you find a better solutions (I also unfortunatelly run windows), share with us.
>
> Kind regards
>
> Miltinho
> Brazil
>
>
>
>
> ----- Mensagem original ----
> De: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
> Para: "Doran, Harold" <HDoran_at_air.org>
> Cc: r-help_at_r-project.org
> Enviadas: Terça-feira, 5 de Fevereiro de 2008 3:06:51
> Assunto: Re: [R] gc() and memory efficiency
>
>
> 1) See ?"Memory-limits": it is almost certainly memory fragmentation.
> You don't need to give the memory back to the OS (and few OSes actually do
> so).
>
> 2) I've never seen this running a 64-bit version of R.
>
> 3) You can easily write a script to do this. Indeed, you could write an R
> script to run multiple R scripts in separate processes in turn (via
> system("Rscript fileN.R") ). For example. Uwe Ligges uses R to script
> building and testing of packages on Windows.
>
> On Mon, 4 Feb 2008, Doran, Harold wrote:
>
> > I have a program which reads in a very large data set, performs some
> > analyses, and then repeats this process with another data set. As soon
> > as the first set of analyses are complete, I remove the very large
> > object and clean up to try and make memory available in order to run the
> > second set of analyses. The process looks something like this:
> >
> > 1) read in data set 1 and perform analyses
> > rm(list=ls())
> > gc()
> > 2) read in data set 2 and perform analyses
> > rm(list=ls())
> > gc()
> > ...
> >
> > But, it appears that I am not making the memory that was consumed in
> > step 1 available back to the OS as R complains that it cannot allocate a
> > vector of size X as the process tries to repeat in step 2.
> >
> > So, I close and reopen R and then drop in the code to run the second
> > analysis. When this is done, I close and reopen R and run the third
> > analysis.
> >
> > This is terribly inefficient. Instead I would rather just source in the
> > R code and let the analyses run over night.
> >
> > Is there a way that I can use gc() or some other function more
> > efficiently rather than having to close and reopen R at each iteration?
> >
> > I'm using Windows XP and r 2.6.1
> >
> > Harold
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> --
> Brian D. Ripley, ripley_at_stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> para armazenamento!
>
> [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 07 Feb 2008 - 03:29:46 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 07 Feb 2008 - 07:30:12 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive