Re: [Rd] sapply improvements

From: Duncan Murdoch <murdoch_at_stats.uwo.ca>
Date: Thu, 05 Nov 2009 10:06:55 -0500

On 11/5/2009 4:05 AM, Martin Maechler wrote:

>>>>>> "PD" == Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
>>>>>>     on Thu, 05 Nov 2009 00:28:51 +0100 writes:

>
> PD> William Dunlap wrote: ...
> >>>
> >>> if (x <= 0) NA else log(x)
> >>>
> >>> variety otherwise.
> >>
> >> Would you only want it to coerce upwards to FUN.VALUES's
> >> type? E.g., allow sapply(z, length,
> >> FUN.VALUE=numeric(1)) to return a numeric vector but die
> >> on sapply(z, function(zi)as.complex(zi[1]),
> >> FUN.VALUE=numeric(1)) If the latter doesn't die should it
> >> return a complex or a numeric vector? (I'd say it needs
> >> to be numeric, but I'd prefer that it died.)
>
> PD> I'd say that it should probably die on downwards
> PD> coercion. Getting a double when an integer is expected,
> PD> or complex instead of double as you indicate, is a
> PD> likely user error. If not, then the user can always
> PD> coerce explicitly inside FUN.
>
> I agree with Peter: Do allow coercion downwards
>
> PD> Another issue is whether one would want to go beyond the
> PD> base classes of S (logical, integer, double, complex,
> PD> character). For other classes, there may be no notion of
> PD> "up" and "down" in coercion. Then again, sapply was
> PD> always limited to what unlist() will handle, so e.g.
>
> >> sapply(1:10,FUN=function(i)Sys.Date())
> PD> [1] 14553 14553 14553 14553 14553 14553 14553 14553
> PD> 14553 14553
>
> PD> as opposed to
>
> >> structure(rep(14553,10), class="Date")
> PD> [1] "2009-11-05" "2009-11-05" "2009-11-05"
> PD> "2009-11-05" "2009-11-05" [6] "2009-11-05" "2009-11-05"
> PD> "2009-11-05" "2009-11-05" "2009-11-05"
>
> Well, using
> as(<prelim_result>, class(<prototype>) )
>
> would be really nice here....
> but alas, we are still not allowed to use as(.,.) in base
> code which I'd tend to call a "design bug" nowadays..

Part of the difficulty here is that we have too many concepts of "class" and "type" in R. For example, as() is not consistent with as.vector() in the following sense:

If neither input is an S4 object, we should have

as(<prelim_result>, class(<prototype>) )

be the same as

as.vector(<prelim_result>, typeof(<prototype>))

and

as.vector(<prelim_result>, class(<prototype>))

and currently as() gives a different result. For example,

 > str(as(1:10, class(double(1))))
  int [1:10] 1 2 3 4 5 6 7 8 9 10
 > str(as.vector(1:10, typeof(double(1))))   num [1:10] 1 2 3 4 5 6 7 8 9 10
 > str(as.vector(1:10, class(double(1))))   num [1:10] 1 2 3 4 5 6 7 8 9 10

So if the coercion were to support as(), we'd need to decide when to follow its rules, and when to follow the existing as.vector() rules (which I think we're more or less following in the current sapply()).

We'd also need to handle the cases involving S4 objects:

I'd say if the prototype is not S4 but the result is, we should die with an error.

If the prototype is S4, then we should use as(). We have fast C code to detect S4 objects, do we have C code to do the coercion? I'd rather not write it, but I wouldn't object if someone else did/already has.

Duncan Murdoch



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 05 Nov 2009 - 15:10:36 GMT

This archive was generated by hypermail 2.2.0 : Thu 05 Nov 2009 - 16:40:21 GMT