Re: [Rd] Changing arguments inside .Call. Wise to encourage "const" on all arguments?

From: Simon Urbanek <simon.urbanek_at_r-project.org>
Date: Mon, 10 Dec 2012 16:48:40 -0500

On Dec 10, 2012, at 2:05 PM, Simon Urbanek wrote:

> 
> On Dec 10, 2012, at 1:51 AM, Paul Johnson wrote:
> 

>> I'm continuing my work on finding speedups in generalized inverse
>> calculations in some simulations. It leads me back to .C and .Call,
>> and some questions I've never been able to answer for myself. It may
>> be I can push some calculations to LAPACK in or C BLAS, that's why I
>> realized again I don't understand the call by reference or value
>> semantics of .Call
>>
>> Why aren't users of .Call encouraged to "const" their arguments, and
>> why doesn't .Call do this for them (if we really believe in return by
>> value)?
>>
> 
> Because there is a difference between the *data* part of the SEXP and the object itself. Internal structure of the object may need to be modified (e.g. the NAMED ref counting increased when you assign it) in a call to R API. You can't flag the data part as const separately, so you have to use non-const SEXP.
> 
> 

>> R Gentleman's R Programming for Bioinformatics is the most
>> understandable treatment I've found on .Call. It appears to me .Call
>> leaves "wiggle room" where there should be none. Here's Gentleman on
>> p. 185. "For .Call and .External, the return value is an R object (the
>> C functions must return a SEXP), and for these functions the values
>> that were passed are typically not modified. If they must be
>> modified, then making a copy in R, prior to invoking the C code, is
>> necessary."
>>
>> I *think* that means:
>>
>> .Call allows return by reference, BUT we really wish users would not
>> use it. Users can damage R data structures that are pointed to unless
>> they really truly know what they are doing on the C side. ??
>>
>> This seems dangerous. Why allow return by reference at all?
>>
> 
> Because it is completely legal to do things like
> 
> SEXP last(SEXP bar) {
>   if (TYPEOF(bar) = VECSXP && LENGTH(bar) > 0)
>     return VECTOR_ELT(bar, LENGTH(bar) - 1);
>  Rf_error("sorry, I only work on lists");
> }
> 

Martin Morgan pointed out that this example is a bad one -- which is true. The common idiom that is safe is

SEXP foo(SEXP bar) {
...
return bar;
}

However, the last() example above is bad, because returning the element directly is a bad idea -- the conservative approach would be to use duplicate(), the more efficient one would be to bump up NAMED. Sorry, my bad. I guess I was rather strengthening Paul's point to duplicate() when in doubt even if it's less efficient :).

Cheers,
Simon

> There is no point in duplicating the element.
> 
> 
> 

>> On p. 197, there's a similar comment "Any function that has been
>> invoked by either .External or .Call will have all of its arguments
>> protected already. You do not need to protect them. .... [T]hey were
>> not duplicated and should be treated as read-only values."
>>
>> "should be ... read-only" concerns me. They are "protected" in the
>> garbage collector sense,
> 
> Yes
> 
> 

>> but they are not protected from "return by
>> reference" damage. Right?
>>
> 
> There is no "return by reference damage".
> 
> The only problem is if you modify input arguments while someone else holds a reference, but there is no way in C to prevent that while still allowing them to be useful. Note that it is legal to modify input arguments if there are no references to it.
> 
> Cheers,
> Simon
> 
> 

>> Why doesn't the documentation recommend function writers to mark
>> arguments to C functions as const? Isn't that what the return by
>> value policy would suggest?
>>
>> Here's a troublesome example in R src/main/array.c:
>>
>> /* DropDims strips away redundant dimensioning information. */
>> /* If there is an appropriate dimnames attribute the correct */
>> /* element is extracted and attached to the vector as a names */
>> /* attribute. Note that this function mutates x. */
>> /* Duplication should occur before this is called. */
>>
>> SEXP DropDims(SEXP x)
>> {
>> SEXP dims, dimnames, newnames = R_NilValue;
>> int i, n, ndims;
>>
>> PROTECT(x);
>> dims = getAttrib(x, R_DimSymbol);
>> [... SNIP ....]
>> setAttrib(x, R_DimNamesSymbol, R_NilValue);
>> setAttrib(x, R_DimSymbol, R_NilValue);
>> setAttrib(x, R_NamesSymbol, newnames);
>> [... SNIP ....]
>>
>> return x;
>> }
>>
>> Well, at least there's a warning comment with that one. But it does
>> show .Call allows call by reference.
>>
>> Why allow it, though? DropDims could copy x, modify the copy, and return it.
>>
>> I wondered why DropDims bothers to return x at all. We seem to be
>> using modify and return by reference there.
>>
>> I also wondered why x is PROTECTED, actually. Its an argument, wasn't
>> it automatically protected? Is it no longer protected after the
>> function starts modifying it?
>>
>> Here's an example with similar usage in Writing R Extensions, section
>> 5.10.1 "Calling .Call". It protects the arguments a and b (needed
>> ??), then changes them.
>>
>> #include <R.h>
>> #include <Rdefines.h>
>>
>> SEXP convolve2(SEXP a, SEXP b)
>> {
>> R_len_t i, j, na, nb, nab;
>> double *xa, *xb, *xab;
>> SEXP ab;
>>
>> PROTECT(a = AS_NUMERIC(a)); /* PJ wonders, doesn't this alter
>> "a" in calling code*/
>> PROTECT(b = AS_NUMERIC(b));
>> na = LENGTH(a); nb = LENGTH(b); nab = na + nb - 1;
>> PROTECT(ab = NEW_NUMERIC(nab));
>> xa = NUMERIC_POINTER(a); xb = NUMERIC_POINTER(b);
>> xab = NUMERIC_POINTER(ab);
>> for(i = 0; i < nab; i++) xab[i] = 0.0;
>> for(i = 0; i < na; i++)
>> for(j = 0; j < nb; j++) xab[i + j] += xa[i] * xb[j];
>> UNPROTECT(3);
>> return(ab);
>> }
>>
>>
>> Doesn't
>>
>> PROTECT(a = AS_NUMERIC(a));
>>
>> have the alter the data structure "a" both inside the C function and
>> in the calling R code as well? And, if a was PROTECTED automatically,
>> could we do without that PROTECT()?
>>
>> pj
>>
>> --
>> Paul E. Johnson
>> Professor, Political Science Assoc. Director
>> 1541 Lilac Lane, Room 504 Center for Research Methods
>> University of Kansas University of Kansas
>> http://pj.freefaculty.org http://quant.ku.edu
>>
>> ______________________________________________
>> R-devel_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
> 
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon 10 Dec 2012 - 21:51:34 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 11 Dec 2012 - 09:52:57 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive