Re: [Rd] Re: [R] Memory Fragmentation in R

From: Prof Brian Ripley <>
Date: Sun 20 Feb 2005 - 10:00:44 EST

I am not the expert here (the author, Luke Tierney, is probably listening), but I understood you to have done a gc() immediately before your second run: you presented statistics from it. If so, then I don't understand in detail. Probably Luke does.

That's good general advice: clear out results you no longer need and run gc() before starting a memory-intensive task (and it also helps if you are timing things not to include the time of gc()-ing previous work). I did sometimes run gc() at the end of each simulation run just to ensure that malloc has the maximal chance to clean up the allocations, in 32-bit days.

On Sat, 19 Feb 2005, Nawaaz Ahmed wrote:

> Thanks Brian. I looked at the code (memory.c) after I sent out the first
> email and noticed the malloc() call that you mention in your reply.
> Looking into this code suggested a possible scenario where R would fail in
> malloc() even if it had enough free heap address space.
> I noticed that if there is enough heap address space (memory.c:1796,
> VHEAP_FREE() > alloc_size)

I don't think that quite corresponds to your words: it is rather that successful allocation would not provoke a gc (unless gc.torture is on).

> then the garbage collector is not run. So malloc
> could fail (since there is no more address space to use), even though R
> itself has enough free space it can reclaim. A simple fix is for R to try
> doing garbage collection if malloc() fails.

I believe running ReleaseLargeFreeVectors would suffice.

> I hacked memory.c() to look in R_GenHeap[LARGE_NODE_CLASS].New if malloc()
> fails (in a very similar fashion to ReleaseLargeFreeVectors())
> I did a "best-fit" stealing from this list and returned it to allocVector().
> This seemed to fix my particular problem - the large vectors that I had
> allocated in the previous round were still sitting in this list.

They should have been released by the gc() you presented the statistics from, and they would have been included in those statistics if still in use at that point. So, I don't understand why they are still around.

> Of course, the right thing to do is to check if there are any free
> vectors of the right size before calling malloc() - but it was simpler
> to do it my way (because I did not have to worry about how efficient my
> best-fit was; memory allocation was anyway going to fail).

I rather doubt that is better than letting the malloc sort this out, as it might be able to consolidate blocks if given them all back at once.

> I can look deeper into this and provide more details if needed.

I am unclear what you actually did, but it may be a judicious gc() is all that was needed: otherwise the issues should be the same in the first and the subsequent run. That's not to say that when the trigger gets near the total address space we could not do better: and perhaps we should not let it to do so (if we could actually determine the size of the address space ... it is 2Gb or 3Gb on Windows for example).

Brian D. Ripley,        
Professor of Applied Statistics,
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________ mailing list
Received on Sun Feb 20 09:12:41 2005

This archive was generated by hypermail 2.1.8 : Fri 18 Mar 2005 - 09:02:57 EST