Re: [Rd] modifying large R objects in place

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Fri, 28 Sep 2007 00:39:30 +0200

Petr Savicky wrote:
> Thank you very much for all the explanations. In particular for pointing
> out that nrow is not a .Primitive unlike dim, which is the
> reason for the difference in their behavior. (I rised the question
> of possible bug due to this difference, not just being unsatisfied
> with nrow). Also, thanks for:
>
> On Thu, Sep 27, 2007 at 05:59:05PM +0100, Prof Brian Ripley wrote:
> [...]
>
>> 2) I expected NAMED on 'a' to be incremented by nrow(a): here is my
>> understanding.
>>
>> When you called nrow(a) you created another reference to 'a' in the
>> evaluation frame of nrow. (At a finer level you first created a promise
>> to 'a' and then dim(x) evaluated that promise, which did SET_NAMED(<SEXP>)
>> = 2.) So NAMED(a) was correctly bumped to 2, and it is never reduced.
>>
>> More generally, any argument to a closure that actually gets used will
>> get NAMED set to 2.
>>
> [...]
>
> This explains a lot.
>
> I appreciate also the patch to matrix by Henrik Bengtsson, which saved
> me time formulating a further question just about this.
>
> I do not know, whether there is a reason to keep nrow, ncol not .Primitive,
> but if there is such, the problem may be solved by rewriting
> them as follows:
>
> nrow <- function(...) dim(...)[1]
> ncol <- function(...) dim(...)[2]
>
> At least in my environment, the new versions preserved NAMED == 1.
>
Yes, but changing the formal arguments is a bit messy, is it not?

Presumably, nrow <- function(x) eval.parent(substitute(dim(x)[1])) works too, but if the gain is important enough to warrant that sort of programming, you might as well make nrow a .Primitive.

Longer-term, I still have some hope for better reference counting, but the semantics of environments make it really ugly -- an environment can contain an object that contains the environment, a simple example being

f <- function()

    g <- function() 0
f()

At the end of f(), we should decide whether to destroy f's evaluation environment. In the present example, what we need to be able to see is that this would remove all refences to g and that the reference from g to f can therefore be ignored. Complete logic for sorting this out is basically equivalent to a new garbage collector, and one can suspect that applying the logic upon every function return is going to be terribly inefficient. However, partial heuristics might apply.

> It has a side effect that this unifies the error messages generated
> by too many arguments to nrow(x) and dim(x). Currently
> a <- matrix(1:6,nrow=2)
> nrow(a,a) # Error in nrow(a, a) : unused argument(s) (1:6)
> dim(a,a) # Error: 2 arguments passed to 'dim' which requires 1
>
> May be, also other solutions exist.
>
> Petr Savicky.
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
   O__  ---- Peter Dalgaard             ุster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Thu 27 Sep 2007 - 22:42:26 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 28 Sep 2007 - 14:41:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.