Re: [Rd] [patch] Behavior of .C() and .Fortran() when given double(0) or integer(0).

From: Pavel N. Krivitsky <krivitsky_at_stat.psu.edu>
Date: Sat, 26 May 2012 15:53:17 -0400

Dear Simon,

On Sat, 2012-05-26 at 14:00 -0400, Simon Urbanek wrote:
> > My suggestion is that in the next release, it ought to be the
> > standard, documented behavior, not just because it's historical, but
> > because it's more convenient and safer.
>
> That is bogus - .C is inherently unsafe wrt vector lengths so talking
> about safety here is IMHO nonsensical. Your "safety" relies on bombing
> the program -

IMHO, not all memory errors are created equal. From the safety perspective, an error that immediately bombs the program is preferable to one that corrupts the memory, producing subtle problems much later or one that reads the wrong memory area and goes into an infinite loop or allocates gigabytes of RAM, etc..

> that is arguably much less safe than using checks that Brian was
> talking about because they are recoverable.

While undoubtedly useful for debugging, I don't think they are particularly recoverable in practice. At best, they tell you that some memory before or after that allocated has been overwritten. They cannot tell you how much memory or whether R is now in an inconsistent state (which may occur if the write is off by more than 64 bytes, I believe), and should be restarted immediately, only taking the time to save the data and history --- which is what a caught segfault in R does anyway, at least on UNIX-alikes.

Furthermore, the guard bytes only trigger after the C routine exits, so the error is only caught some time after it occurs, which makes debugging it more difficult. (In contrast, a debugger like GDB can tell exactly which C statement caused a segmentation fault.)

The one advantage guard bytes might have over NULL (for a 0-length vector) is that an error caught by a guard byte might allow the developer to browse (via options(error=recover)) the R function that made the .C() call, but even that relies on the bug not overwriting more than a few bytes, and it cannot detect improper reads.

> You can argue either way, but there is no winner - the real answer is
> use .Call() instead.

It seems to me that the 0-length->NULL approach still dominates on the matter of safety and debugging, with a possible exception in what I am pretty sure is a relatively rare scenario when the developer has passed a 0-length vector via .C() _and_ it was written to _and_ the developer wants to browse (using error=recover()) the R code leading up to the problematic .C() call, rather than browse (via GDB) the C code that triggered the segfault. In that scenario, the developer can still easily infer what argument was passed as an empty vector and via what .C() call. (Standardizing on 0-length->NULL does not preclude putting guard bytes on non-empty vectors, of course.)

> > From the point of view of programmer convenience, a having a 0-length
> > vector on the R side always map to a NULL pointer on the C side provides
> > a useful bit of information that the programmer can use, while a
> > non-NULL pointer to no data isn't useful, and the current R-devel
> > behavior requires the programmer to pass the information about whether
> > it's empty through an additional argument (of which there is an upper
> > limit). For example, if a procedure implemented in C takes optional
> > weights, passing a double(0) that was translated to NULL could be used
> > to signal that there are no weights.
>
> That would be just plain wrong use that certainly should not be
> encouraged - you *have* to pass the length along with any vectors
> passed to .C (that's why you should not be even thinking of using .C
> in the first place!) so it is much safer to check that the length you
> passed is 0 rather than relying on special-casing into NULL pointers.

Not necessarily. In the weighted data scenario, the length of the data vector would, presumably, be passed in a different argument, and, if weights exist, their length would equal to that. The NULL here could be a binary signal not to use weights.

While I understand that .Call() interface has many advantages over .C(), .C() remains a simple and convenient interface that doesn't require the developer to learn too much about R's internals, and, either way, as long as the .C() interface is not being deprecated, I think that it ought to be made as safe and as useful as possible.

                                 Best,
                                 Pavel

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sat 26 May 2012 - 20:02:03 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 29 May 2012 - 13:21:58 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive