Re: [Rd] sapply improvements

From: William Dunlap <wdunlap_at_tibco.com>
Date: Wed, 04 Nov 2009 13:30:18 -0800

> -----Original Message-----
> From: Peter Dalgaard [mailto:p.dalgaard_at_biostat.ku.dk]
> Sent: Wednesday, November 04, 2009 1:16 PM
> To: William Dunlap
> Cc: Duncan Murdoch; r-devel_at_r-project.org
> Subject: Re: [Rd] sapply improvements
>
> William Dunlap wrote:
> > It looks good on following examples:
> >
> >> z <- split(log(1:10), rep(letters[1:2],c(3,7)))
> >> sapply(z, length, FUN.VALUE=numeric(1))
> > Error in sapply(z, length, FUN.VALUE = numeric(1)) :
> > FUN values must be of type 'double'
> >
> > (I'd like the error to say "... must be of type 'double',
> > not 'integer'", to give the user a fuller diagnosis of
> > the problem.)
>
> Umm, not following too closely, but would it not be
> preferable just to
> coerce in such cases? I can see a lot of issues of the
>
> if (x <= 0) NA else log(x)
>
> variety otherwise.

Would you only want it to coerce upwards to FUN.VALUES's type? E.g., allow

   sapply(z, length, FUN.VALUE=numeric(1)) to return a numeric vector but die on

   sapply(z, function(zi)as.complex(zi[1]), FUN.VALUE=numeric(1)) If the latter doesn't die should it return a complex or a numeric vector? (I'd say it needs to be numeric, but I'd prefer that it died.)   

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

>
> >> sapply(z, range, FUN.VALUE=c(Min=0,Max=0))
> > a b
> > Min 0.000000 1.386294
> > Max 1.098612 2.302585
> >
> > Exactly matching the typeof's and using the names
> > for row.names on matrix output seem good to me.
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
> >> -----Original Message-----
> >> From: Duncan Murdoch [mailto:murdoch_at_stats.uwo.ca]
> >> Sent: Wednesday, November 04, 2009 12:24 PM
> >> To: William Dunlap
> >> Cc: michael.m.spiegel_at_gmail.com; r-devel_at_stat.math.ethz.ch
> >> Subject: sapply improvements
> >>
> >> On 11/4/2009 12:15 PM, William Dunlap wrote:
> >>>> -----Original Message-----
> >>>> From: r-devel-bounces_at_r-project.org
> >>>> [mailto:r-devel-bounces_at_r-project.org] On Behalf Of
> Duncan Murdoch
> >>>> Sent: Wednesday, November 04, 2009 8:47 AM
> >>>> To: michael.m.spiegel_at_gmail.com
> >>>> Cc: R-bugs_at_r-project.org; r-devel_at_stat.math.ethz.ch
> >>>> Subject: Re: [Rd] error in install.packages() (PR#14042)
> >>>>
> > ...
> >>>> For future reference: the problem was that it assigned
> >> the result of
> >>>> sapply() to a subset of a vector. Normally sapply()
> >> simplifies its
> >>>> result to a vector, but in this case the result was empty, so
> >>>> sapply()
> >>>> returned an empty list; assigning a list to a vector coerced
> >>>> the vector
> >>>> to a list, and then the "invalid subscript type 'list'" came
> >>>> soon after.
> >>> I've run into this sort of problem a lot (0-long input to sapply
> >>> causes it to return list()). A related problem is that
> >> when sapply's
> >>> FUN doesn't always return the type of value you expect for some
> >>> corner case then sapply won't do the expected simplication. If
> >>> sapply had an argument that gave the expected form of FUN's output
> >>> then sapply could (a) die if some call to FUN didn't return
> >> something
> >>> of that form and (b) return a 0-long object of the correct form
> >>> if sapply's X has length zero so FUN is never called. E.g.,
> >>> sapply(2:0, function(i)(11:20)[i],
> FUN.VALUE=integer(1)) # die on
> >>> third iteration
> >>> sapply(integer(0), function(i)i>0,
> FUN.VALUE=logical(1)) # return
> >>> logical(0)
> >>>
> >>> Another benefit of sapply knowing the type of FUN's
> return value is
> >>> that it wouldn't have to waste space creating a list of
> FUN's return
> >>> values but could stuff them directly into the final output
> >> structure.
> >>> A list of n scalar doubles is 4.5 times bigger than
> >> double(n) and the
> >>> factor is 9.0 for integers and logicals.
> >>
> >> What do you think of the behaviour of the sapply function
> below? (I
> >> wouldn't put it into R as it is, I'd translate it to C code
> >> to avoid the
> >> lapply call; but I'd like to get the behaviour right before
> >> doing that.)
> >>
> >> This one checks that the length() and typeof() results are
> >> consistent.
> >> If the FUN.VALUE has names, those are used (but it doesn't
> >> require the
> >> names from FUN to match).
> > ...
> >
> > ______________________________________________
> > R-devel_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> --
> O__ ---- Peter Dalgaard ุster Farimagsgade 5, Entr.B
> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
> (*) \(*) -- University of Copenhagen Denmark Ph:
> (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk) FAX:
> (+45) 35327907
>



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed 04 Nov 2009 - 21:33:08 GMT

This archive was generated by hypermail 2.2.0 : Thu 05 Nov 2009 - 02:00:21 GMT