Re: [Rd] modifying large R objects in place

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Fri, 28 Sep 2007 17:46:25 +0200

Duncan Murdoch wrote:
> On 9/28/2007 7:45 AM, Petr Savicky wrote:
>

>> On Fri, Sep 28, 2007 at 12:39:30AM +0200, Peter Dalgaard wrote:
>>     

> ...
>
>>> Longer-term, I still have some hope for better reference counting, but 
>>> the semantics of environments make it really ugly -- an environment can 
>>> contain an object that contains the environment, a simple example being 
>>>
>>> f <- function()
>>>    g <- function() 0
>>> f()
>>>
>>> At the end of f(), we should decide whether to destroy f's evaluation 
>>> environment. In the present example, what we need to be able to see is 
>>> that this would remove all refences to g and that the reference from g 
>>> to f can therefore be ignored.  Complete logic for sorting this out is 
>>> basically equivalent to a new garbage collector, and one can suspect 
>>> that applying the logic upon every function return is going to be 
>>> terribly inefficient. However, partial heuristics might apply.
>>>       
>> I have to say that I do not understand the example very much.
>> What is the input and output of f? Is g inside only defined or
>> also used?
>>     
>

> f has no input; it's output is the function g, whose environment is the
> evaluation environment of f. g is never used, but it is returned as the
> value of f. Thus we have the loop:
>

> g refers to the environment.
> the environment contains g.
>

> Even though the result of f() was never saved, two things (the
> environment and g) got created and each would have non-zero reference
> count.
>

> In a more complicated situation you might want to save the result of the
> function and then modify it. But because of the loop above, you would
> always think there's another reference to the object, so every in-place
> modification would require a copy first.
>
>

Thanks Duncan. It was way past my bedtime when I wrote that...

I had actually missed the point about the return value, but the point remains even if you let f return something other than g: You get a situation where the two objects both have a refcount of 1, so by standard refcounting semantics neither can be removed even though neither object is reachable.

Put differently, standard refcounting assumes that references between objects of the language form a directed acyclic graph, but when environments are involved, there can be cycles in R-like languages.

    -p

> Duncan Murdoch

>

> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
-- 
   O__  ---- Peter Dalgaard             ุster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Fri 28 Sep 2007 - 15:52:44 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 28 Sep 2007 - 20:42:51 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.