Re: [Rd] reference counting bug related to break and next in loops

From: William Dunlap <wdunlap_at_tibco.com>
Date: Fri, 12 Jun 2009 12:07:13 -0700

> -----Original Message-----
> From: William Dunlap
> Sent: Wednesday, June 10, 2009 2:35 PM
> To: 'luke_at_stat.uiowa.edu'
> Cc: r-devel_at_r-project.org
> Subject: RE: [Rd] reference counting bug related to break and
> next in loops
>
> Thanks Luke.
>
> I used codetools::walkCode to look for functions that returned the
> value of a loop and found a surprising number in base R.
> However, it looks like non of these functions were written to return
> anything useful (the side effects were important), so the change
> to loop-returns-NULL should let such functions use less
> memory (and time) and make them correspond better to their
> documentation.

I also use walkCode to look for assignents whose right hand sides were loops. The only *.R file in a directory containing a current R build that had such a thing was

   ./library/base/R-ex/print.R
(from help(print)) and related files in ./tests. It had

   rr <- for(i in 1:3) print(1:i)
   rr
That last line used to print

   [1] 1 2 3
and now prints

   NULL
I didn't notice a corresponding *.Rout.save file so the test won't notice the change in behavior.

I've attached the code I used. It is pretty crude. functionReturnsLoopValue("file")
and functionAssignsLoopValue("file") parse "file" and then return a list of functions
that return the value of a loop and a list of assignments involving the value of a loop,
respectively. functionAssignsLoopValue looks for ordinary assignments (x<-while(...))
and for default argument values (function(x=while(...)){}). It does not yet look for
things passed as arguments in function calls, c(1, while(x>0)x<-x-1, 3). The last
might pick up some false positives involving functions that don't evaluate their
arguments.  

>
> E.g., fixup.libraries.URLs is not documented to return anything
> and it returns the value of for loop:
>
> $`src/library/utils/R/windows/linkhtml.R`[[2]]
> function(lib.loc = .libPaths())
> {
> for (lib in lib.loc) {
> if(lib == .Library) next
> pg <- sort(.packages(all.available = TRUE, lib.loc = lib))
> for(pkg in pg)
> try(fixup.package.URLs(file.path(lib,pkg)), silent = TRUE)
> }
> }
>
> Bill Dunlap
> TIBCO Software Inc - Spotfire Division
> wdunlap tibco.com
>
> > -----Original Message-----
> > From: luke_at_stat.uiowa.edu [mailto:luke_at_stat.uiowa.edu]
> > Sent: Wednesday, June 10, 2009 1:05 PM
> > To: William Dunlap
> > Cc: r-devel_at_r-project.org
> > Subject: Re: [Rd] reference counting bug related to break and
> > next in loops
> >
> > Thanks for the report.
> >
> > It turns out that a similar issue arises in while() loops without
> > break/next being involved because the test expression is evaluated
> > after the final body evaluation. After some discussion we
> decided it
> > was simplest both for implementation and documentation to have the
> > value of a loop expression always be NULL. This is now
> implemented in
> > R-devel.
> >
> > luke
> >
> > On Tue, 2 Jun 2009, William Dunlap wrote:
> >
> > > One of our R users here just showed me the following problem while
> > > investigating the return value of a while loop. I added some
> > > information
> > > on a similar bug in for loops. I think he was using 2.9.0
> > > but I see the same problem on today's development version
> of 2.10.0
> > > (svn 48703).
> > >
> > > Should the semantics of while and for loops be changed
> > slightly to avoid
> > > the memory
> > > buildup that fixing this to reflect the current docs would
> > entail? S+'s
> > > loops return nothing useful - that change was made long
> ago to avoid
> > > memory buildup resulting from semantics akin the R's
> > present semantics.
> > >
> > > Bill Dunlap
> > > TIBCO Software Inc - Spotfire Division
> > > wdunlap tibco.com
> > >
> > > --------------------Forwarded (and edited) message
> > >
> > below---------------------------------------------------------
> > ----------
> > > ----------
> > >
> > > I think I have found another reference counting bug.
> > >
> > > If you type in the following in R you get what I think is
> the wrong
> > > result.
> > >
> > >> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) {
> > break }; i =
> > > i + 1; y}; q
> > > [1] 42 42 42 42 42 42 42 42 9 10
> > >
> > > I had expected [1] 42 42 42 42 42 42 42 8 9 10 which is
> > what you get
> > > if you add 0 to y in the last statement in the while loop:
> > >
> > >> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) {
> > break }; i =
> > > i + 1; y + 0}; q
> > > [1] 42 42 42 42 42 42 42 8 9 10
> > >
> > > Also,
> > >
> > >> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break };
> > > i<-i+1 ; if (i<=8&&i>3)next ; cat("Completing iteration",
> > i, "\n"); y};
> > > q
> > > Completing iteration 2
> > > Completing iteration 3
> > > [1] 42 42 42 42 42 42 42 42 9 10
> > >
> > > but if the last statement in the while loop is y+0 instead
> > of y I get
> > > the
> > > expected result:
> > >
> > >> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break };
> > > i<-i+1 ; if (i<=8&&i>3)next ; cat("Completing iteration",
> i, "\n");
> > > y+0L}; q
> > > Completing iteration 2
> > > Completing iteration 3
> > > [1] 42 42 3 4 5 6 7 8 9 10
> > >
> > > A background to the problem is that in R a while-loop
> > returns the value
> > > of the last iteration. However there is an exception if an
> > iteration is
> > > terminated by a break or a next. Then the value is the
> value of the
> > > previously completed iteration that did not execute a
> break or next.
> > > Thus in an extreme case the value of the while may be the
> > value of the
> > > very first iteration even though it executed a million iterations.
> > >
> > > Thus to implement that correctly one needs to keep a
> > reference to the
> > > value of the last non-terminated iteration. It seems as if
> > the current R
> > > implementation does that but does not increase the
> reference counter
> > > which explains the odd behavior.
> > >
> > > The for loop example is
> > >
> > >> z<-{ tmp<-rep(pi,10);for(i in 1:10){
> tmp[i]<-i^2;if(i==9)break ; if
> > > (i<9&&i>3)next ; tmp } }
> > >> z
> > > [1] 1.000000 4.000000 9.000000 16.000000 25.000000 36.000000
> > > 49.000000
> > > [8] 64.000000 81.000000 3.141593
> > >> z<-{ tmp<-rep(pi,10);for(i in 1:10){
> tmp[i]<-i^2;if(i==9)break ; if
> > > (i<9&&i>3)next ; tmp+0 } }
> > >> z
> > > [1] 1.000000 4.000000 9.000000 3.141593 3.141593 3.141593 3.141593
> > > 3.141593
> > > [9] 3.141593 3.141593
> > >
> > > I can think of a couple of ways to solve this.
> > >
> > > 1. Increment the reference counter. This solves the
> > bug but may
> > > have serious performance implications. In the while
> example above it
> > > needs to copy y in every iteration.
> > >
> > > 2. Change the semantics of while loops by getting rid of the
> > > exception described above. When a loop is terminated with a
> > break the
> > > value of the loop would be NULL. Thus there is no need to keep a
> > > reference to the value of the last non-terminated iteration.
> > >
> > > Any opinions?
> > >
> > > ______________________________________________
> > > R-devel_at_r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> >
> > --
> > Luke Tierney
> > Chair, Statistics and Actuarial Science
> > Ralph E. Wareham Professor of Mathematical Sciences
> > University of Iowa Phone: 319-335-3386
> > Department of Statistics and Fax: 319-335-3017
> > Actuarial Science
> > 241 Schaeffer Hall email: luke_at_stat.uiowa.edu
> > Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
> >



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Fri 12 Jun 2009 - 19:11:59 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 12 Jun 2009 - 22:35:44 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive