Re: [Rd] [patch] Behavior of .C() and .Fortran() when given double(0) or integer(0).

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Thu, 17 May 2012 10:46:40 +0100

On 04/05/2012 18:42, Pavel N. Krivitsky wrote:
> Dear R-devel,
>
> While tracking down some hard-to-reproduce bugs in a package I maintain,
> I stumbled on a behavior change between R 2.15.0 and the current R-devel
> (or SVN trunk).
>
> In 2.15.0 and earlier, if you passed an 0-length vector of the right
> mode (e.g., double(0) or integer(0)) as one of the arguments in a .C()
> call with DUP=TRUE (the default), the C routine would be passed NULL
> (the C pointer, not R NULL) in the corresponding argument. The current

Where did you get that from? The documentation says it passes an (e.g.) double* pointer to a copy of the data area of the R vector. There is no change in the documented behaviour .... Now, of course a zero-length area can be at any address, but none is stated anywhere that I am aware of.

> development version instead passes it a pointer to what appears to be
> memory location immediately following the the SEXP that holds the
> metadata for the argument. If the argument has length 0, this is often
> memory belonging to a different R object. (DUP=FALSE in 2.15.0
> appears to have the same behavior as R-devel.)
>
> .C() documentation and Writing R Extensions don't explicitly specify a
> behavior for 0-length vectors, so I don't know if this change is
> intentional, or whether it was a side-effect of the following news item:
>
> .C() and .Fortran() do less copying: arguments which are raw,
> logical, integer, real or complex vectors and are unnamed are not
> copied before the call, and (named or not) are not copied after
> the call. Lists are no longer copied (they are supposed to be
> used read-only in the C code).
>
> Was the change in the empty vector behavior intentional?
>
> It seems to me that standardizing on the behavior of giving the C
> routine NULL is safer, more consistent with other memory-related
> routines, and more convenient: whereas dereferencing a NULL pointer is
> an immediate (and therefore easily traced) segfault, dereferencing an

That's not true, in general.

> invalid pointer that is nevertheless in the general memory area
> allocated to the program often causes subtle errors down the line;
> R_alloc asked to allocate 0 bytes returns NULL, at least on my platform;

Again, undocumented and should not be relied on.

> and the C routine can easily check if a pointer is NULL, but with the
> R-devel behavior, the programmer has to add an explicit way of telling
> that an empty vector was passed.

It's no different from any other vector length: it is easy for careless programmers to read/write off the ends of the allocated area, and this is why in R-devel we have an option to check for that (and of course also what valgrind is good at finding in an instrumented version of R).

> I've attached a small test case (dotC_NULL.* files) that shows the
> difference. The C file should be built with R CMD SHLIB, and the R file
> calls the functions in the library with a variety of arguments. Output I
> get from running
> R CMD BATCH --no-timing --vanilla --slave dotC_NULL.R
> on R 2.15.0, R trunk, and R trunk with my patch (described below) are attached.
>
> The attached patch (dotC_NULL.patch) against the current trunk
> (affecting src/main/dotcode.c) restores the old behavior for DUP=TRUE
> (i.e., 0-length vector -> NULL pointer) and extends it to the DUP=FALSE
> case. It does so by checking if an argument --- if it's of mode raw,
> integer, real, or complex --- to a .C() or .Fortran() call has length 0,
> and, if so, sets the pointer to be passed to NULL and then skips the
> copying of the C routine's changes back to the R object for that
> argument. The additional computing cost should be negligible (i.e.,
> checking if vector length equals 0 and break-ing out of a switch
> statement if so).
>
> The patch appears to work, at least for my package, and R CMD check
> passes for all recommended packages (on my 64-bit Linux system), but
> this is my first time working with R's internals, so handle with care.

That's easy: we will not be changing this. In particular, the new checks I refer to above rely on passing the address of an in-process memory area with guard bytes.

> Best,
> Pavel Krivitsky
>
>
>
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Thu 17 May 2012 - 09:50:14 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 26 May 2012 - 17:22:03 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive