[Rd] reference counting bug related to break and next in loops

From: William Dunlap <wdunlap_at_tibco.com>
Date: Tue, 02 Jun 2009 20:17:49 -0700

Should the semantics of while and for loops be changed slightly to avoid the memory
buildup that fixing this to reflect the current docs would entail? S+'s loops return nothing useful - that change was made long ago to avoid memory buildup resulting from semantics akin the R's present semantics.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com

```--------------------Forwarded (and edited) message
below-------------------------------------------------------------------
```

I think I have found another reference counting bug.

If you type in the following in R you get what I think is the wrong result.

> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break }; i =
i + 1; y}; q
 42 42 42 42 42 42 42 42 9 10

I had expected  42 42 42 42 42 42 42 8 9 10 which is what you get if you add 0 to y in the last statement in the while loop:

> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break }; i =
i + 1; y + 0}; q
 42 42 42 42 42 42 42 8 9 10

Also,

> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break };
i<-i+1 ; if (i<=8&&i>3)next ; cat("Completing iteration", i, "\n"); y}; q
Completing iteration 2
Completing iteration 3
 42 42 42 42 42 42 42 42 9 10

but if the last statement in the while loop is y+0 instead of y I get the
expected result:

> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break };
i<-i+1 ; if (i<=8&&i>3)next ; cat("Completing iteration", i, "\n"); y+0L}; q
Completing iteration 2
Completing iteration 3
 42 42 3 4 5 6 7 8 9 10

A background to the problem is that in R a while-loop returns the value of the last iteration. However there is an exception if an iteration is terminated by a break or a next. Then the value is the value of the previously completed iteration that did not execute a break or next. Thus in an extreme case the value of the while may be the value of the very first iteration even though it executed a million iterations.

Thus to implement that correctly one needs to keep a reference to the value of the last non-terminated iteration. It seems as if the current R implementation does that but does not increase the reference counter which explains the odd behavior.

The for loop example is

> z<-{ tmp<-rep(pi,10);for(i in 1:10){ tmp[i]<-i^2;if(i==9)break ; if
(i<9&&i>3)next ; tmp } }
> z

 1.000000 4.000000 9.000000 16.000000 25.000000 36.000000 49.000000
 64.000000 81.000000 3.141593
> z<-{ tmp<-rep(pi,10);for(i in 1:10){ tmp[i]<-i^2;if(i==9)break ; if
(i<9&&i>3)next ; tmp+0 } }
> z

 1.000000 4.000000 9.000000 3.141593 3.141593 3.141593 3.141593 3.141593
 3.141593 3.141593

I can think of a couple of ways to solve this.

1. Increment the reference counter. This solves the bug but may have serious performance implications. In the while example above it needs to copy y in every iteration.
2. Change the semantics of while loops by getting rid of the exception described above. When a loop is terminated with a break the value of the loop would be NULL. Thus there is no need to keep a reference to the value of the last non-terminated iteration.

Any opinions?

R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed 03 Jun 2009 - 03:21:51 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 10 Jun 2009 - 21:36:21 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.