Re: [Rd] modifying large R objects in place

From: Petr Savicky <>
Date: Fri, 28 Sep 2007 13:45:16 +0200

On Fri, Sep 28, 2007 at 12:39:30AM +0200, Peter Dalgaard wrote: [...]
> >nrow <- function(...) dim(...)[1]
> >ncol <- function(...) dim(...)[2]
> >
> >At least in my environment, the new versions preserved NAMED == 1.
> >
> Yes, but changing the formal arguments is a bit messy, is it not?

Specifically for nrow, ncol, I think not much, since almost nobody needs to know (or even knows) that the name of the formal argument is "x".

However, there is another argument against the ... solution: it solves the problem only in the simplest cases like nrow, ncol, but is not usable in other, like colSums, rowSums. These functions also increase NAMED of its argument, although their output does not contain any reference to the original content of their arguments.

I think that a systematic solution of this problem may be helpful. However, making these functions Internal or Primitive would not be good in my opinion. It is advantageous that these functions contain an R level part, which
makes the basic decisions before a call to .Internal. If nothing else, this serves as a sort of documentation.

For my purposes, I replaced calls to "colSums" and "matrix" by the corresponding calls to .Internal in my script. The result is that now I can complete several runs of my calculation in a cycle instead of restarting R after each of the runs.

This leads me to a question. Some of the tests, which I did, suggest that gc() may not free all the memory, even if I remove all data objects by rm() before calling gc(). Is this possible or I must have missed something?

A possible solution to the unwanted increase of NAMED due to temporary calculations could be to give the user the possibility to store NAMED attribute of an object before a call to a function and restore it after the call. To use this, the user should be confident that no new reference to the object persists after the function is completed.

> Presumably, nrow <- function(x) eval.parent(substitute(dim(x)[1])) works
> too, but if the gain is important enough to warrant that sort of
> programming, you might as well make nrow a .Primitive.

You are right. This indeed works.

> Longer-term, I still have some hope for better reference counting, but
> the semantics of environments make it really ugly -- an environment can
> contain an object that contains the environment, a simple example being
> f <- function()
> g <- function() 0
> f()
> At the end of f(), we should decide whether to destroy f's evaluation
> environment. In the present example, what we need to be able to see is
> that this would remove all refences to g and that the reference from g
> to f can therefore be ignored. Complete logic for sorting this out is
> basically equivalent to a new garbage collector, and one can suspect
> that applying the logic upon every function return is going to be
> terribly inefficient. However, partial heuristics might apply.

I have to say that I do not understand the example very much. What is the input and output of f? Is g inside only defined or also used?

Let me ask the following question. I assume that gc() scans the whole memory and determines for each part of data, whether a reference to it still exists or not. In my understanding, this is equivalent to determine, whether NAMED of it may be dropped to zero or not. Structures, for which this succeeds are then removed. Am I right? If yes, is it possible during gc() to determine also cases, when NAMED may be dropped from 2 to 1? How much would this increase the complexity of gc()?

Thank you in advance for your kind reply.

Petr Savicky. mailing list Received on Fri 28 Sep 2007 - 11:47:20 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 28 Sep 2007 - 14:41:27 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.