Re: [Rd] External pointers and an apparent memory leak

From: Simon Urbanek <simon.urbanek_at_r-project.org>
Date: Thu, 15 Sep 2011 15:27:40 -0400

Jim,

ok, now we're getting somewhere ;)

The most important details are a) Linux b) small allocations.

The answer is that Linux cannot release the memory due to the way its allocator works - it's not your fault. This has been discussed in part in https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14611

You get hit by the same issue - notice that there is no issue for large allocations in your example, because Linux uses mmap for those and not brk. You can use my mallinfo package to query the glibc allocator.

A really simple example to illustrate the issue using just R code is:

a = paste("foo", 1:1e6)
b = "bar"

Linux won't release any brk memory used by a because b is behind it, so even though a is allocated as a whole in a contiguous region, that region is before b so it cannot be released. The same applies to you since your mallocs/callocs will use brk and thus just a single subsequent R object is enough to block it out:

> a = paste("foo", 1:1e6)
> mallinfo()

   arena   ordblks    smblks     hblks    hblkhd   usmblks   fsmblks  uordblks 
120893440       527         1         8  30105600         0       112 120327680 
fordblks  keepcost 
  565760     41584 

> b = "bar"
> .Internal(inspect(a))

@7f9664cd2010 16 STRSXP g1c7 [MARK,NAM(1)] (len=1000000, tl=0)
 @4a67e58 09 CHARSXP g1c1 [MARK,gp=0x20] "foo 1"
 @4a67e28 09 CHARSXP g1c1 [MARK,gp=0x20] "foo 2"
 @4a67df8 09 CHARSXP g1c1 [MARK,gp=0x20] "foo 3"
 @4a67dc8 09 CHARSXP g1c1 [MARK,gp=0x20] "foo 4"
 @4a67d98 09 CHARSXP g1c1 [MARK,gp=0x20] "foo 5"
 ...
> .Internal(inspect(b))

@4a73a48 16 STRSXP g0c1 [NAM(2)] (len=1, tl=0)  @4a73a18 09 CHARSXP g0c1 [gp=0x20,ATT] "bar"
> rm(a)
> gc()

         used (Mb) gc trigger (Mb) max used (Mb) Ncells 132154 7.1 1801024 96.2 2133131 114.0 Vcells 1112432 8.5 7089858 54.1 6613056 50.5
> mallinfo()

   arena   ordblks    smblks     hblks    hblkhd   usmblks   fsmblks  uordblks 
120893440      1019         3         5  10096640         0       176  14944848 
fordblks  keepcost 
105948592     25440 


Cheers,
Simon  

On Sep 15, 2011, at 1:30 PM, James Bullard wrote:

> Hi Simon, Matt
>
> First, thank you for the help. My memory is still growing and it is clear that I'm removing the things I am allocating - potentially it is just Linux not giving back the memory until another process needs it, but it definitely doesn't behave that way when I allocate directly within R. To be a better poster:
>
> #> system("uname -a")
> #Linux mp-f020.nanofluidics.com 2.6.32-30-server #59-Ubuntu SMP Tue Mar 1 22:46:09 UTC 2011 x86_64 #GNU/Linux
>
> #> sessionInfo()
> #R version 2.13.1 Patched (2011-09-13 r57007)
> #Platform: x86_64-unknown-linux-gnu (64-bit)
>
> Here, if you look at the successive allocations you'll see that by the end I have started to grow my memory and, at least w.r.t. the ps method of memory profiling, I'm leaking memory.
>

>> showPS <- function() system(paste('ps -eo pid,vsz,%mem | grep', Sys.getpid()))
>> gcl <- function() { lapply(1:10, gc, verbose = F)[[10]] }
>> 
>> showPS()

> 18937 147828 0.1
>> m <- .Call("h5R_allocate_gig")
>> rm(m)
>> gcl()

> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 213919 11.5 407500 21.8 213919 11.5
> Vcells 168725 1.3 786432 6.0 168725 1.3
>> showPS()

> 18937 147828 0.1
>> 
>> m <- sapply(1:1000, function(a) {

> + .Call("h5R_allocate_meg")
> + })
>> rm(m)
>> gcl()

> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 213920 11.5 467875 25 213920 11.5
> Vcells 168725 1.3 786432 6 168725 1.3
>> showPS()

> 18937 147828 0.1
>> 
>> m <- sapply(1:100000, function(a) {

> + .Call("h5R_allocate_k")
> + })
>> rm(m)
>> gcl()

> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 213920 11.5 818163 43.7 213920 11.5
> Vcells 168725 1.3 895968 6.9 168725 1.3
>> showPS()

> 18937 271860 0.9
>> 
>> m <- sapply(1:1000000, function(a) {

> + .Call("h5R_allocate_k")
> + })
>> rm(m)
>> gcl()

> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 213920 11.5 785114 42.0 213920 11.5
> Vcells 168725 1.3 1582479 12.1 168725 1.3
>> showPS()

> 18937 1409568 7.8
>
> I have redone the examples to better demonstrate the issue I am seeing. Below is the C code:
>
> #include <hdf5.h>
> #include <Rinternals.h>
> #include <R.h>
> void h5R_allocate_finalizer(SEXP eptr) {
> char* vector = R_ExternalPtrAddr(eptr);
> Free(vector);
> R_ClearExternalPtr(eptr);
> }
> SEXP h5R_allocate_meg() {
> char* vector = (char*) Calloc(1048576, char);
> for (int j = 0; j < 1048576; j++) {
> vector[j] = 'c';
> }
> SEXP e_ptr = R_MakeExternalPtr(vector, R_NilValue, R_NilValue);
> PROTECT(e_ptr);
> R_RegisterCFinalizerEx(e_ptr, h5R_allocate_finalizer, TRUE);
> UNPROTECT(1);
> return e_ptr;
> }
> SEXP h5R_allocate_k() {
> char* vector = (char*) Calloc(1024, char);
> for (int j = 0; j < 1024; j++) {
> vector[j] = 'c';
> }
> SEXP e_ptr = R_MakeExternalPtr(vector, R_NilValue, R_NilValue);
> PROTECT(e_ptr);
> R_RegisterCFinalizerEx(e_ptr, h5R_allocate_finalizer, TRUE);
> UNPROTECT(1);
> return e_ptr;
> }
> SEXP h5R_allocate_gig() {
> char* vector = (char*) Calloc(1073741824, char);
> for (int j = 0; j < 1073741824; j++) {
> vector[j] = 'c';
> }
> SEXP e_ptr = R_MakeExternalPtr(vector, R_NilValue, R_NilValue);
> PROTECT(e_ptr);
> R_RegisterCFinalizerEx(e_ptr, h5R_allocate_finalizer, TRUE);
> UNPROTECT(1);
> return e_ptr;
> }
>
>
> Finally, when I use valgrind on the test script, I see:
>
> ==22098== 135,792 bytes in 69 blocks are possibly lost in loss record 1,832 of 1,858
> ==22098== at 0x4C274A8: malloc (vg_replace_malloc.c:236)
> ==22098== by 0x4F5D799: GetNewPage (memory.c:786)
> ==22098== by 0x4F5EE6F: Rf_allocVector (memory.c:2330)
> ==22098== by 0x4F6007F: R_MakeWeakRefC (memory.c:1198)
> ==22098== by 0xE01BACF: h5R_allocate_k (h5_debug.c:33)
> ==22098== by 0x4EE17E4: do_dotcall (dotcode.c:837)
> ==22098== by 0x4F18D02: Rf_eval (eval.c:508)
> ==22098== by 0x4F1A7FD: do_begin (eval.c:1420)
> ==22098== by 0x4F18B1A: Rf_eval (eval.c:482)
> ==22098== by 0x4F1B7FC: Rf_applyClosure (eval.c:838)
> ==22098== by 0x4F189F7: Rf_eval (eval.c:526)
> ==22098== by 0x4E6F3D8: do_lapply (apply.c:72)
>
> Thanks for any help!
>
> jim
> ________________________________________
> From: Simon Urbanek [simon.urbanek_at_r-project.org]
> Sent: Thursday, September 15, 2011 8:35 AM
> To: James Bullard
> Cc: r-devel_at_r-project.org
> Subject: Re: [Rd] External pointers and an apparent memory leak
>
> Jim,
>
> On Sep 14, 2011, at 5:21 PM, James Bullard wrote:
>
>> I'm using external pointers and seemingly leaking memory. My determination of a memory leak is that the R process continually creeps up in memory as seen by top while the usage as reported by gc() stays flat. I have isolated the C code:
>> 
>> void h5R_allocate_finalizer(SEXP eptr) {
>>   Rprintf("Calling the finalizer\n");
>>   void* vector = R_ExternalPtrAddr(eptr);
>>   free(vector);
>>   R_ClearExternalPtr(eptr);
>> }
>> 
>> SEXP h5R_allocate(SEXP size) {
>>   int i = INTEGER(size)[0];
>>   char* vector = (char*) malloc(i*sizeof(char));
>>   SEXP e_ptr = R_MakeExternalPtr(vector, R_NilValue, R_NilValue);
>>   R_RegisterCFinalizerEx(e_ptr, h5R_allocate_finalizer, TRUE);
>>   return e_ptr;
>> }
>> 
>> 
>> If I run an R program like this:
>> 
>> v <- replicate(100000, {
>> .Call("h5R_allocate", as.integer(1000000))
>> })
>> rm(v)
>> gc()
>> 

>
> This seems a little optimistic to me - at least on the machines most mortals have - since it will allocate ~93GB of memory - before rm/gc:
>
> vmmap:
>
> VIRTUAL ALLOCATION BYTES
> MALLOC ZONE SIZE COUNT ALLOCATED % FULL
> =========== ======= ========= ========= ======
> DefaultMallocZone_0x1004cf000 62.8M 120044 93.5G 152363%
> environ_0x100601000 1024K 27 1280 0%
> =========== ======= ========= ========= ======
> TOTAL 63.8M 120071 93.5G 149977%
>
> ps:
>
> UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND
> 501 26287 26170 0 31 0 100511220 64864 - S+ s002 1:06.81 /Library/Frameworks/R.framework/Resources/bin/exec/x86_64/R
>
> fortunately it's never used, so it's actually possible (purely virtual). But as Matt said, it gets released without problems - after rm/gc:
>
>> gc()

> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 433341 23.2 667722 35.7 597831 32.0
> Vcells 630031 4.9 1300721 10.0 1211088 9.3
>
> VIRTUAL ALLOCATION BYTES
> MALLOC ZONE SIZE COUNT ALLOCATED % FULL
> =========== ======= ========= ========= ======
> DefaultMallocZone_0x1004cf000 59.3M 19083 36.6M 61%
> environ_0x100601000 1024K 27 1280 0%
> =========== ======= ========= ========= ======
> TOTAL 60.3M 19110 36.6M 60%
>
> 501 26287 26170 0 31 0 2522872 60880 - S+ s002 1:35.69 /Library/Frameworks/R.framework/Resources/bin/exec/x86_64/R
>
>> sessionInfo()

> R version 2.13.1 (2011-07-08)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
>
>> Then you can see the problem (top reports that R still has a bunch of memory, but R doesn't think it does). I have tried using valgrind and it says I have memory left on the table at the end lest you think it is because top. Also, I have tried Free/Calloc as well and this doesn't make a difference. Finally, I see this in both R-2-12 (patched) and R-2-13 - I think it is more an understanding issue on my part.
>> 

>
> You didn't mention your OS - some OSes do not release memory immediately (some wait until you try to allocate new memory) and some can't release certain type of memory at all. Also depending on your OS allocation library you can get more info about the allocation pool to understand what is going on. But for that you'd have to share with us the platform info ...
>
>
>> thanks much in advance, to me it really resembles the connection.c code, but what am I missing?
>> 

>
> Cheers,
> Simon
>
>
> PS: This has nothing to do with your question but I'd suggest checking the result on malloc [e.g.,
> if (!vector) Rf_error("unable to allocate %d bytes", i);
> Also i = asInteger(size) is much more safe (and convenient) than i = INTEGER(size)[0] and completely irrelevantly as.integer(1000000) is more efficiently written as 1000000L.
>
>
>> thanks, jim
>> 
>> 
>>      [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-devel_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> 

>
>
>


R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 15 Sep 2011 - 19:41:53 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 16 Sep 2011 - 04:50:34 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive