Re: [Rd] modifying large R objects in place

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Thu, 27 Sep 2007 15:45:03 +0100 (BST)

  1. You implicitly coerced 'a' to be numeric and thereby (almost) doubled its size: did you intend to? Does that explain your confusion?
  2. I expected NAMED on 'a' to be incremented by nrow(a): here is my understanding.

When you called nrow(a) you created another reference to 'a' in the evaluation frame of nrow. (At a finer level you first created a promise to 'a' and then dim(x) evaluated that promise, which did SET_NAMED(<SEXP>) = 2.) So NAMED(a) was correctly bumped to 2, and it is never reduced.

More generally, any argument to a closure that actually gets used will get NAMED set to 2.

Having too high a value of NAMED could never be a 'bug'. See the explanation in the R Internals manual:

   When an object is about to be altered, the named field is consulted. A    value of 2 means that the object must be duplicated before being    changed. (Note that this does not say that it is necessary to    duplicate, only that it should be duplicated whether necessary or not.)

3) Memory profiling can be helpful in telling you exactly what copies get made.

On Thu, 27 Sep 2007, Petr Savicky wrote:

> On Wed, Sep 26, 2007 at 10:52:28AM -0700, Byron Ellis wrote:
>> For the most part, doing anything to an R object result in it's
>> duplication. You generally have to do a lot of work to NOT copy an R
>> object.
>
> Thank you for your response. Unfortunately, you are right. For example,
> the allocated memory determined by top command on Linux may change during
> a session as follows:
> a <- matrix(as.integer(1),nrow=14100,ncol=14100) # 774m
> a[1,1] <- 0 # 3.0g
> gc() # 1.5g
>
> In the current applicatin, I modify the matrix only using my own C code
> and only read it on R level. So, the above is not a big problem for me
> (at least not now).
>
> However, there is a related thing, which could be a bug. The following
> code determines the value of NAMED field in SEXP header of an object:
>
> SEXP getnamed(SEXP a)
> {
> SEXP out;
> PROTECT(out = allocVector(INTSXP, 1));
> INTEGER(out)[0] = NAMED(a);
> UNPROTECT(1);
> return(out);
> }
>
> Now, consider the following session
>
> u <- matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
> .Call("getnamed",u) # 1 (OK)
>
> length(u)
> .Call("getnamed",u) # 1 (OK)
>
> dim(u)
> .Call("getnamed",u) # 1 (OK)
>
> nrow(u)
> .Call("getnamed",u) # 2 (why?)
>
> u <- matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
> .Call("getnamed",u) # 1 (OK)
> ncol(u)
> .Call("getnamed",u) # 2 (so, ncol does the same)
>
> Is this a bug?
>
> Petr Savicky.
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Thu 27 Sep 2007 - 14:48:29 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 27 Sep 2007 - 22:41:56 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.