Re: R-alpha: assignment scoping

Luke Tierney (luke@stat.umn.edu)
Thu, 30 May 1996 12:16:56 -0500 (CDT)


From: Luke Tierney <luke@stat.umn.edu>
Message-Id: <9605301716.AA04081@nokomis.stat.umn.edu>
Subject: Re: R-alpha: assignment scoping
To: wvenable@attunga.stats.adelaide.edu.au (Bill Venables)
Date: Thu, 30 May 1996 12:16:56 -0500 (CDT)
In-Reply-To: <9605290321.AA14412@attunga.stats.adelaide.edu.au> from "Bill Venables" at May 29, 96 12:51:27 pm

Bill Venables wrote:
> 
>  > 3) Claim that x[1] <- 3 is an error in this context.  Attempting to
>  >    mutate a local x when there is no local x seems like a rather
>  >    odd thing to want to do.
> 
> This is my preferred option even if S does do something different.  
> 

As I understand the S semantics, the expression

	x[1] <- 3

is syntactic sugar for

	x <- "[<-"(x,1,3)

It may be implemented differently for efficiency reasons, but this is
what it is supposed to mean. This is the basis for the general
assignment mechanism in which

	f(x,...) <- y

means

	x <- "f<-"(x,...,y)

If the expression

	x <- g(x,1,3)

occurs in a function then we know what it means:

	If a local variable "x" exists, make its value the result of g(x,1,3).
	If not, create a new local variable "x" with vaule g(x,1,3).

Since this is true for any g, I would expect it to be so if g is the
function assigned to the global variable named "[<-". So it seems to
me that for the semantics to be consistent you either need to do
things the way S does, or change the interpretation of <-. It would be
difficult (or at least awkward) to require

	x[1]<-3

to refer to a "local x unless there is a global declaration" without
making the same requirement of

	x<-3

and if you do that, then you have no way to create local variables.
It wouldn't be to unreasonable to have some tool that would examine a
function and issue a warning when it sees x[1]<-3 when x isn't known
to be local.

The problem, from a language design point of view, is that in S "<-"
plays two roles: In addition to assignment to change the value of a
variable it is also responsible for creating variables. Most other
languages separate these features, usually by requiring a declaration
before use or by using some form of let construct, e.g. in Lisp

	(let ((x 3))     ; creates the binding
	  (setf x 4))    ; changes the value

(Other functional languages like ML also have let forms, but of course
don't have assignment.) It would be possible to do something along
those lines within R syntax, say

	let (x=3) {
	  x<-y
        }

With lexical scope this would be semantically equivalent to, but much
clearer than,

	(function(x) { x<-y })(3)

Having a construct like this would tend to make programs clearer and
would eliminate the need for separate <- and <<- operators.  (It might
be useful to warn if an assignment is made to a global variable.)  But
this would be very different from S.

The overloading of assignmet and binding creates some peculiar
situations. For example,

	function(x) {
	  if (x) y<-3;
	  y
        }

Is the final y global or local? You can't tell by looking at the
function, i.e. by a static analysis -- it depends on what value of x
is passed to the function at run time. This makes understanding a
function harder for human readers as well as for programs that try to
read the function, such as compilers. A compiler would like to replace
all references to local variables by direct accesses to their
pre-computed storage locations. But that isn't possible here, since
the semantics don't tell you whether y is local or free at compile
time. Mix this with the previous example, and you get

	function(x) {
	  if (x) y<-3;
	  y[1] <- 4;
	  y
	}
	
Whether you should complain about the assignment depends on the
runtime vaule of x.

Adding lexical scope into the mix complicates life a bit as
well. Consider

	function() {
	  g<-function() y;
	  z <- g()
	  y<-3
	  c(z, g())
	}

Which y does/should g use in it's two calls, global or local? Here is
what R currently does:

> f<-function() {
          g<-function() y;
          z <- g()
          y<-3
          c(z, g())
        }
> y<-1
> f()
[1] 1 3

The first call uses the global value and the second the one local to
f.

If local variables had to be established with a let, the meaning
would be unambiguous: in

	function() {
	  g<-function() y;
          z <- g()
          let (y=3) {
            c(z, g())
          }
        } 

the y in g would be the global one in both calls since the binding
surrounding the second call to g isn't visible to g when g is defined.

Since lexical scope is unique to R, the semantics could be changed so
that the environment used by g consists of only those binding that
existed the time that g was created, which might be more natural in
some ways but would have a significant drawback: you could no longer
define local recursive or mutually recursive functions -- the current
approach allows that.

Separating binding creation from binding mutation would make for a
cleaner language design, but it would be fundamentally incompatible
with S. The current approach is a bit awkward since one cannot
determine by static analysis whether a variable is bound or free for
all variables. But one can separate variable references into ones that
are definitely locally bound, definitely free, and ambiguous, and for
most functions the unambiguous cases will predominate.

Well, almost. Then there is the explicit availability of frames.  This
posting is already long enough -- I'll save that one for later.

luke



=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
r-testers mailing list -- To (un)subscribe, send
subscribe	or	unsubscribe
(in the "body", not the subject !)  To: r-testers-request@stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-