Re: [Rd] Re: [R] Memory Fragmentation in R

From: Nawaaz Ahmed <>
Date: Sun 20 Feb 2005 - 13:40:02 EST

> I am unclear what you actually did, but it may be a judicious gc() is
> all that was needed: otherwise the issues should be the same in the
> first and the subsequent run. That's not to say that when the trigger
> gets near the total address space we could not do better: and perhaps we
> should not let it to do so (if we could actually determine the size of
> the address space ... it is 2Gb or 3Gb on Windows for example).

I did do gc() but only at the top level functions - there were internal functions in libraries/packages that were allocating space.

Here is how I think the problem happens. Consider code of the form

         x = as.vector(x)
	y = as.double(y)

where x is a 500MB matrix, y is 100 MB

Let's say we have 1201MB totally.

            x has 500MB, y has 100MB
            heap can grow by 601MB

	x = as.vector(x):
	   x has 500 MB, y has 100MB
            as.vector() duplicated 500MB (to be garbage collected)
            heap can grow by 101 MB

         y = as.vector(y)
            x has 500 MB, y has 100 MB
            R has 500 MB to be garbage collected
            as.vector() requires 100MB for duplicating y
            garbage collector is not run
                - required amount (100MB) < possible heap growth (101MB)
	   allocVector() calls malloc()
                - malloc() can fail at this point
                - it cannot get contiguous 100MB

You are right, it is most likely to happen close to the trigger. But the fix should be easy (call gc() if malloc() fails) - I initially hacked at trying to steal vectors from the free list because I thought the problem I was seeing was due to address space fragmentation. The latter could still be a problem and would be harder to fix.

Thanks Luke and Brian!
Nawaaz mailing list Received on Sun Feb 20 12:49:28 2005

This archive was generated by hypermail 2.1.8 : Sun 20 Feb 2005 - 13:30:08 EST