Re: [Rd] modifying large R objects in place

From: Peter Dalgaard <P.Dalgaard_at_biostat.ku.dk>
Date: Thu, 27 Sep 2007 16:43:31 +0200

Petr Savicky wrote:
> On Wed, Sep 26, 2007 at 10:52:28AM -0700, Byron Ellis wrote:
>
>> For the most part, doing anything to an R object result in it's
>> duplication. You generally have to do a lot of work to NOT copy an R
>> object.
>>
>
> Thank you for your response. Unfortunately, you are right. For example,
> the allocated memory determined by top command on Linux may change during
> a session as follows:
> a <- matrix(as.integer(1),nrow=14100,ncol=14100) # 774m
> a[1,1] <- 0 # 3.0g
> gc() # 1.5g
>
> In the current applicatin, I modify the matrix only using my own C code
> and only read it on R level. So, the above is not a big problem for me
> (at least not now).
>
> However, there is a related thing, which could be a bug. The following
> code determines the value of NAMED field in SEXP header of an object:
>
> SEXP getnamed(SEXP a)
> {
> SEXP out;
> PROTECT(out = allocVector(INTSXP, 1));
> INTEGER(out)[0] = NAMED(a);
> UNPROTECT(1);
> return(out);
> }
>
> Now, consider the following session
>
> u <- matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
> .Call("getnamed",u) # 1 (OK)
>
> length(u)
> .Call("getnamed",u) # 1 (OK)
>
> dim(u)
> .Call("getnamed",u) # 1 (OK)
>
> nrow(u)
> .Call("getnamed",u) # 2 (why?)
>
> u <- matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
> .Call("getnamed",u) # 1 (OK)
> ncol(u)
> .Call("getnamed",u) # 2 (so, ncol does the same)
>
> Is this a bug?
>
No. It is an infelicity.

The issues are that
1. length() and dim() call .Primitive directly, whereas nrow() and ncol() are "real" R functions
2. NAMED records whether an object has _ever_ had 0, 1, or 2+ names

During the evaluation of ncol(u). the argument x is evaluated, and at that point the object "u" is also named "x" in the evaluation frame of ncol(). A full(er) reference counting system might drop NAMED back to 1 when exiting ncol(), but currently, R can only count up (and trying to find the conditions under which it is safe to reduce NAMED will make your head spin, believe me! )
> Petr Savicky.
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
   O__  ---- Peter Dalgaard             Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Thu 27 Sep 2007 - 14:46:33 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 27 Sep 2007 - 16:41:36 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.