[Rd] Improved version of Rprofmem

From: Radford Neal <radford_at_cs.toronto.edu>
Date: Sun, 14 Aug 2011 12:01:01 -0400


The Rprofmem facility is currently enabled only if the configuration option --enable-memory-profiling is used. However, the overhead of having it enabled is negligible when profiling is not actually being done, and can easily be made even smaller. So I think it ought to be enabled all the time.

I've attached a patch doing this, which also makes a number of other improvements to Rprofmem, which are upward-compatible with the current version.

First, it allows for the profiling reports to be printed to the terminal (with Rprintf) as well as or instead of going to a file. This is not only a convenience, but also provides more information when these reports are interspersed with other output, allowing the source of the memory allocations to be better determined.

Second, it gives the option for the alloction reports to include the type of the vector and its length, not just the number of bytes allocated.

Third, it allows for all vector allocations to be reported, not just those that are large enough to be done with malloc (this distinction doesn't seem important for most uses). It also allows for reports to be produced only when the vector allocated has at least some number of elements, rather than only providing a threshold based on number of bytes. This seems more natural if a user knows that they are dealing with some vectors of length 100000 and wants to see only those (or bigger ones).

Also, if either the option for terminal output or for type and length details is used, allocation reports always end with a newline. For some reason, the present code doesn't write a newline if the call stack happens to be empty. (Though not documented, this is clearly deliberate, not a bug.) Though I can't see why one would want this, it is retained as the default behaviour for backward compatibility.

Finally, I changed the printf format for values that are cast with (unsigned long) to %lu, rather than %ld, which I think is not correct.

I think incorporating this in the upcoming 2.14.0 release would be useful. For instance, the following gives some useful insights into what R is doing with memory allocation:

     > Rprofmem("",terminal=TRUE,pages=FALSE,details=TRUE,nelem=10000)
     > f <- function (x)
     + { cat("Now in f\n");

+ s <- sum(x);
+ cat("sum =",s,"\n");
+ x[10] <- 1;
+ s <- sum(x);
+ cat("sum =",s,"\n")
+ x[20] <- 1;
+ s <- sum(x);
+ cat("sum =",s,"\n")
+ y <<- x
+ NULL
+ } > f(rep(2,10000)) Now in f RPROFMEM: 40040 (integer 10000):"f" RPROFMEM: 40040 (integer 10000):"f" RPROFMEM: 80040 (double 10000):"f" sum = 20000 RPROFMEM: 80040 (double 10000):"f" sum = 19999 sum = 19998 RPROFMEM: 80040 (double 10000):"f" NULL > y[1] <- 0 > Rprofmem("")

You can see the details of my modifications from the output of help(Rprofmem) after applying the patch I have attached, which I've put below:

Rprofmem                 package:utils                 R Documentation

Enable Profiling of R's Memory Use

Description:

     Enable or disable reporting of memory allocation in R.

Usage:

     Rprofmem(filename = "Rprofmem.out", append = FALSE, 
              threshold = 0, nelem = 0, 
              terminal = FALSE, pages = TRUE, details = FALSE)
     

Arguments:

filename: The file to which reports of memory allocations are written,

          or 'NULL' or '""' if reports should not go to a file.

  append: logical: should the file be appended to rather than

          overwritten?

threshold: numeric: only allocations of vectors with size larger than

          this number of bytes will be reported.

   nelem: numeric: only allocations of vectors with at least this many

          elements will be reported.

terminal: logical: should reports be printed on the terminal (as well

          as possibly written to 'filename')?

   pages: logical: should allocation of pages for small vectors be

          reported, and reporting of individual small vector
          allocations suppressed?

 details: logical: should details of allocation be reported, rather
          than only the total number of bytes?

Details:

     The profiler tracks memory allocations, some of which will be to
     previously used memory and will not increase the total memory use
     of R.

     Calling 'Rprofmem' with either 'terminal=TRUE' or with 'filename'
     something other than 'NULL' or '""' (or both) will enable
     profiling, with allocation reports going to one or both places.
     Reports to the terminal are preceded by "RPROFMEM:".  Enabling
     profiling automatically disables any existing profiling to another
     or the same file or to the terminal.

     Calling 'Rprofmem' with 'terminal=FALSE' (the default) and
     'filename' either 'NULL' or '""' will disable profiling.

     If 'pages=TRUE' (the default) allocations of individual vectors
     will be reported only if they are "large", and allocations of
     pages to hold small vectors will be reported.  The size of a page
     of memory and the size over which a vector is "large" (and hence
     for which 'malloc' is used) are compile-time constants, by default
     2000 and 128 bytes respectively.

     If 'pages=FALSE', allocations of all vectors with size over
     'threshold' and number of elements at least 'nelem' are reported,
     and page allocations are not reported.

     A report of an allocation of a vector (to 'filename' and/or the
     terminal) will contain the number of bytes allocated and the names
     of functions in the call stack.  If 'details=TRUE' (not the
     default), the type and length of the vector allocated will also be
     displayed (in parentheses) before the call stack.

     An allocation of a page for small vectors (when 'pages=TRUE') will
     result in a report consisting of "new page:" followed by the call
     stack.

     When 'terminal=TRUE' or 'details=TRUE', a newline is always
     written after each allocation report.  For backward compatibility,
     this is otherwise not the case when the call stack is empty.

Value:

     None

Note:

     The memory profiler can be used at the same time as other R and C
     profilers.

See Also:

     The R sampling profiler, 'Rprof' also collects memory information.

     'tracemem' traces duplications of specific objects.

     The "Writing R Extensions" manual section on "Tidying and
     profiling R code"

Examples:

     # Reports printed to the terminal, with details, for all vectors of 
     # at least 10 elements.
     Rprofmem("", terminal=TRUE, pages=FALSE, details=TRUE, nelem=10)
     v <- numeric(10)
     v[3] <- 1
     u <- v
     v[3] <- 2
     Rprofmem("")
     
     ## Not run:
     
     # Reports go to a file.
     Rprofmem("Rprofmem.out", threshold=1000)
     example(glm)
     Rprofmem(NULL)
     noquote(readLines("Rprofmem.out", n=5))
     ## End(Not run)

______________________________________________

R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Sun 14 Aug 2011 - 16:29:24 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 16 Aug 2011 - 15:10:19 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive