Re: R-beta: S Compatibility (again)

Bill Venables (wvenable@attunga.stats.adelaide.edu.au)
Mon, 13 Apr 1998 12:41:59 +0930


Date: Mon, 13 Apr 1998 12:41:59 +0930
Message-Id: <9804130311.AA23184@attunga.stats.adelaide.edu.au>
From: Bill Venables <wvenable@attunga.stats.adelaide.edu.au>
To: Thomas Lumley <thomas@biostat.washington.edu>
Subject: Re: R-beta: S Compatibility (again)
In-Reply-To: <Pine.LNX.3.96.980412094924.517B-100000@buffy>
	<Pine.LNX.3.96.980412094924.517B-100000@buffy>

My thanks to Thomas and Peter for their rapid responses to my
note.  I wish to make it clear I am not complaining but simply
trying to sort out the compatibility situation.  It helps if you
wish to know what projects are possible in R and which are not.

With that in mind, let me offer a few definitely non-complaining
replies...

Thomas Lumley writes:
 > On Sun, 12 Apr 1998, Bill Venables wrote:
 > > 
 > > 3. Language manipulation within R seems to be impossible.

That's too strong and I withdraw.

 > > 
 > >    To be more specific, the R substitute() is much more
 > >    limited than the S version and coercion to mode "{",
 > >    "call" or "function" are unavailable, and function
 > >    objects are not subsetable [and hence not modifiable].

That still stands, though.

 > >    For example, as far as I can see it is impossible to
 > >    write a version of the S function deriv() for R, short of
 > >    getting down to brass tacks and writing a new primitive
 > >    into the base code, but even then you can't extend it, of
 > >    course.

To my surprise there is indeed a deriv() in R as Thomas points
out, which works well and very fast, but it is as I said it would
have to be, a new primitive (D, in fact; deriv is an .Internal)
and it does suffer the limitation I feared, that you cannot
extend it short of re-writing the C code.  If I want to find a
derivative of an expression involving, for example, something as
common as pnorm or dnorm, -->||| (brick wall).  OK, it's possible
to use Maple, &c &c, but R seems like it is tantalizingly close.

<aside>
Even the way deriv is handled in S is also quite clumsy and
difficult to extend, but it is possible (and a certain notorious
yellow book gives hints on how to do so).  It seems to me that a
far better and more `object oriented' way of handling it would
have been to allow users to write functions of class
"differentiable", say, with a "deriv" attribute which would give
the code fragments necessary for handling the symbolic
differentiation operation.  This would allow relatively easy and
non-invasive user extensbility and instead of the gigantic switch
statements (or case statements in the C code) the problem would
effectively be handled by method dispatch.  [My suspicion now is
that every time I see a big case or switch statement I am looking
at pre-OOP code, (or "OOPs-a-daisy" code as we used to call it,
heh heh...)]
</aside>

 > While language manipulation is much more limited than in S,
 > you probably could write a version of deriv() (if it didn't
 > already exist) since expressions can be manipulated quite
 > well.

The main problems are that substitute() is not a general
substitution tool as it is in S, and while expressions can be
manipulated, functions it seems cannot.  deriv can build a
function but you can't.  The following is pretty uncompromising:

> as.function
function (x) 
stop("mode function cannot be assigned")

Without functions you really have not made an object that can be
called part of the language, in my view.  It's a big limitation.
(See below for some specifics.)

Peter Dalgaard writes:
 > Bill Venables <wvenable@stats.adelaide.edu.au> writes:
 > 
 > > R:                                 S:                            
 > > 				                                 
 > > > substring(n, 0, nchar(n))	   > substring(n, 0, nchar(n))   
 > > [1] "@" ""  ""  "" 		   [1] ""  "a" "b" "c"           

What you see here is email transmission damage.  You don't get
"@" but "\300".

 > Should get fixed - or R should protest about an invalid
 > argument.  If for nothing else, then because
 > 
 > > substring(n, -2000, nchar(n))
 > Segmentation fault (core dumped)

A much more cogent argument.  

However what is the status of "" (nul) as a 'character'?  Is the
S version itself completely consistent?  What would you expect to
happen in response to substring("", 1, 0)?  These are not
rhetorical questions, I just don't know.

 > (Current devel. snapshot, but most likely the same in the other
 > versions.) 

It is, at least on old Linux machines.

 > > 3. Language manipulation within R seems to be impossible.  I
....
 > As Thomas already noted, you can in fact do a substantial
 > amount of expression manipulation, but some things aren't
 > quite the same as in S.  It might be useful if you could
 > provide some examples of things that (you think) can be done
 > in S but not in R.

Here is a cute example of what can be done in S but not in R.
Make a function for the pdf of an order statistic.

> pdf.order <- function(n, r, pfun, dfun) {
  con <- round(exp(lgamma(n + 1) - lgamma(r) - lgamma(n - r + 1)))
  substitute(
    function(x) {
      Fx <- p(x)
      K*Fx^r1*(1 - Fx)^nr*f(x)
    }, 
    list(p = substitute(pfun), f = substitute(dfun), 
         r1 = r-1, nr = n-r, K = con)
  )
}
> pdf.order(9, 5, pnorm, dnorm)
function(x)
{
        Fx <- pnorm(x)
        630 * Fx^4 * (1 - Fx)^4 * dnorm(x)
}

The substitute()s to get unevaluated arguments do work but the
one to modify the function definitely does not.  substitute() is
a very different kind of function in R from what it is in S.

More seriously, though, I learned of this while trying to port my
function glm.nb, a negative binomial model fitting function from
S to R.  It uses substitute() like this in quite a fundamental
way within an iteration and as far as I can see that method
simply cannot be emulated in R because of the impossibility of
modifying functions.  Domage.

 > Data.dump() has been on several peoples wishlist for a while -
 > as far as I can see, it's simply an efficient representation
 > of the output of dput(), so again most likely fairly easy,
 > once someone finds the time to sit down with a sample dump
 > file and do the actual coding.

I agree it does not seem like it should be too difficult.  It is
very important, though, since it allows transfer of objects
between systems that does not have to be parsed as it is read,
and for which the backout required is only very limited.

Can I make the plea, though, that when someone does get round to
looking at it, that the result be fully compatible with S,
including the representation of non-printable characters in a
printable (and hence emailable) form?  This should really be a
fundamental part of the design specification.  Data transfer and
elementary object transfer between R and S should be a smooth
operation and data.dump and data.restore are all about efficint,
portable transfer.

 > Objects(), however, owes some of its differences from S to the
 > different scoping rules, so I suspect that it can never have
 > the same semantics.

You may be right but frankly this surprises me.  Both systems
have a search path but objects(2) in S has to be written
objects(pos=2) in R.  I don't think that has much to do with
scoping.

Regards,
Bill

-- 
Bill Venables, Head, Dept of Statistics,    Tel.: +61 8 8303 5418
University of Adelaide,                     Fax.: +61 8 8303 3696
South AUSTRALIA.     5005.   Email: Bill.Venables@adelaide.edu.au

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._