Re: [Rd] often unnecessary duplicate in sapply / as.vector

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Tue 11 Jul 2006 - 15:25:39 GMT

On Tue, 11 Jul 2006, Thomas Lumley wrote:

> On Tue, 11 Jul 2006, Prof Brian Ripley wrote:
>
> > On Fri, 7 Jul 2006, Thomas Lumley wrote:

> >
> > > On Fri, 7 Jul 2006, Martin Morgan wrote:

> > >
> > > > sapply calls lapply as
> > > >
> > > > answer <- lapply(as.list(X), FUN, ...)
> > > >
> > > > which, when X is a list, causes X to be duplicated unnecessarily. The
> > > > coercion is unnecessary for other mode(X) because in lapply we have
> > > >
> > > > if (!is.list(X)) X <- as.list(X)
> > >
> > > That looks reasonable.
> >
> > And you have made the change. Unfortunately it is not really reasonable,
> > as is.list(X) does not test that X is a list (see its documentation) in
> > the same sense as as.list, so pairlists are now passed to the internal
> > code.
>
> Where do we still get pairlists in interpreted code? I thought they had all
> been hidden.

Not quite all. You can use pairlist() to create them, and .Options is one (fairly long) example. (I used pairlist to create a very slow example.)

> > There's something rather undesirable going on here. The internal code for
> > lapply (in its current version, not the one I wrote) does the internal
> > equivalent of
> >
> > rval <- vector("list", length(X))
> > for(i in seq(along = X))
> > rval[i] <- list(FUN(X[[i]], ...))
> >
> > from the earlier
> >
> > lapply <- function(X, FUN, ...) {
> > FUN <- match.fun(FUN)
> > if (!is.list(X))
> > X <- as.list(X)
> > rval <- vector("list", length(X))
> > for(i in seq(along = X))
> > rval[i] <- list(FUN(X[[i]], ...))
> > names(rval) <- names(X) # keep `names' !
> > return(rval)
> > }
> >
> > so all that is needed is that X[[i]] work.
> >
> > For a pairlist [[i]] done repeatedly is very inefficient (since it starts
> > at the beginning each time), so we *do* want to coerce pairlists here.
>
> Or have a separate loop using CDR and CAR rather than [[, which would mean not
> having to copy X.

If we are going there we should also special-case all the (much more common) vector types thereby avoiding [[, which I have so far resisted.

> > On the other hand, we do not need to coerce expressions or atomic vectors
> > for which [[]] works just fine.
>
> Indeed.

I've just committed a version that is a lot faster, fast enough to shave 5% off the total time for both the stats and boot examples.

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Wed Jul 12 02:42:04 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 11 Jul 2006 - 18:28:23 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.