Re: [Rd] Re: [R] Memory Fragmentation in R

From: Luke Tierney <luke_at_stat.uiowa.edu>
Date: Sun 20 Feb 2005 - 09:58:23 EST

On Sat, 19 Feb 2005, Nawaaz Ahmed wrote:

> Thanks Brian. I looked at the code (memory.c) after I sent out the first
> email and noticed the malloc() call that you mention in your reply.
> Looking into this code suggested a possible scenario where R would fail in

> malloc() even if it had enough free heap address space.
>

> I noticed that if there is enough heap address space (memory.c:1796,
> VHEAP_FREE() > alloc_size) then the garbage collector is not run. So malloc
> could fail (since there is no more address space to use), even though R

> itself has enough free space it can reclaim. A simple fix is for R to try

> doing garbage collection if malloc() fails.
>
> I hacked memory.c() to look in R_GenHeap[LARGE_NODE_CLASS].New if malloc()

> fails (in a very similar fashion to ReleaseLargeFreeVectors())

> I did a "best-fit" stealing from this list and returned it to allocVector().

> This seemed to fix my particular problem - the large vectors that I had

> allocated in the previous round were still sitting in this list. Of course,
> the right thing to do is to check if there are any free vectors of the right
> size before calling malloc() - but it was simpler to do it my way (because I
> did not have to worry about how efficient my best-fit was; memory allocation
> was anyway going to fail).
>
> I can look deeper into this and provide more details if needed.

Thanks. It looks like it would be a good idea to modify the malloc at that point to try a GC if the malloc fails, then retry the malloc and only bail if the second malloc fails. I want to think this through a bit more before going ahead, but I think it will be the right thing to do.

Best,

luke

>
> Nawaaz
>
>
>
>
>
> Prof Brian Ripley wrote:
>> BTW, I think this is really an R-devel question, and if you want to pursue
>> this please use that list. (See the posting guide as to why I think so.)
>>
>> This looks like fragmentation of the address space: many of us are using
>> 64-bit OSes with 2-4Gb of RAM precisely to avoid such fragmentation.
>>
>> Notice (memory.c line 1829 in the current sources) that large vectors are
>> malloc-ed separately, so this is a malloc failure, and there is not a lot R
>> can do about how malloc fragments the (presumably in your case as you did
>> not say) 32-bit process address space.
>>
>> The message
>> 1101.7 Mbytes of heap free (51%)
>> is a legacy of an earlier gc() and is not really `free': I believe it means
>> something like `may be allocated before garbage collection is triggered':
>> see memory.c.
>>
>>
>> On Sat, 19 Feb 2005, Nawaaz Ahmed wrote:
>>
>>> I have a data set of roughly 700MB which during processing grows up to 2G
>>> ( I'm using a 4G linux box). After the work is done I clean up (rm()) and
>>> the state is returned to 700MB. Yet I find I cannot run the same routine
>>> again as it claims to not be able to allocate memory even though gcinfo()
>>> claims there is 1.1G left.
>>>
>>> At the start of the second time
>>> ===============================
>>> used (Mb) gc trigger (Mb)
>>> Ncells 2261001 60.4 3493455 93.3
>>> Vcells 98828592 754.1 279952797 2135.9
>>>
>>> Before Failing
>>> ==============
>>> Garbage collection 459 = 312+51+96 (level 0) ...
>>> 1222596 cons cells free (34%)
>>> 1101.7 Mbytes of heap free (51%)
>>> Error: cannot allocate vector of size 559481 Kb
>>>
>>> This looks like a fragmentation problem. Anyone have a handle on this
>>> situation? (ie. any work around?) Anyone working on improving R's
>>> fragmentation problems?
>>>
>>> On the other hand, is it possible there is a memory leak? In order to make
>>> my functions work on this dataset I tried to eliminate copies by coding
>>> with references (basic new.env() tricks). I presume that my cleaning up
>>> returned the temporary data (as evidenced by the gc output at the start of
>>> the second round of processing). Is it possible that it was not really
>>> cleaned up and is sitting around somewhere even though gc() thinks it has
>>> been returned?
>>>
>>> Thanks - any clues to follow up will be very helpful.
>>> Nawaaz
>>
>>
>
> ______________________________________________
> R-devel@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke@stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Sun Feb 20 09:11:54 2005

This archive was generated by hypermail 2.1.8 : Sun 20 Feb 2005 - 11:30:24 EST