Re: [Rd] often unnecessary duplicate in sapply / as.vector

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Tue 11 Jul 2006 - 12:20:17 GMT

On Fri, 7 Jul 2006, Thomas Lumley wrote:

> On Fri, 7 Jul 2006, Martin Morgan wrote:
>
> > sapply calls lapply as
> >
> > answer <- lapply(as.list(X), FUN, ...)
> >
> > which, when X is a list, causes X to be duplicated unnecessarily. The
> > coercion is unnecessary for other mode(X) because in lapply we have
> >
> > if (!is.list(X)) X <- as.list(X)
>
> That looks reasonable.

And you have made the change. Unfortunately it is not really reasonable, as is.list(X) does not test that X is a list (see its documentation) in the same sense as as.list, so pairlists are now passed to the internal code.

There's something rather undesirable going on here. The internal code for lapply (in its current version, not the one I wrote) does the internal equivalent of

    rval <- vector("list", length(X))
    for(i in seq(along = X))

        rval[i] <- list(FUN(X[[i]], ...))

from the earlier

lapply <- function(X, FUN, ...) {

    FUN <- match.fun(FUN)
    if (!is.list(X))

        X <- as.list(X)
    rval <- vector("list", length(X))
    for(i in seq(along = X))

        rval[i] <- list(FUN(X[[i]], ...))
    names(rval) <- names(X)               # keep `names' !
    return(rval)
}

so all that is needed is that X[[i]] work.

For a pairlist [[i]] done repeatedly is very inefficient (since it starts at the beginning each time), so we *do* want to coerce pairlists here.

On the other hand, we do not need to coerce expressions or atomic vectors for which [[]] works just fine.

> > More generally, perhaps as.vector might not duplicate when mode(x) == mode ?
>
> This isn't a trivial change, because mode(x)==mode does not guarantee
> that as.vector(x, mode) has no effect. For example, with mode="numeric" it
> removes attributes.

And with mode="list" it does not (although that is not as documented). We can certainly do better. [This is another of those cases where 'mode' is confusing, and in fact it would be typeof(x) == mode.]

However, for now let us concentrate on as.list.default, which does

as.list.default <- function (x, ...)
{

    if (is.function(x))
	return(c(formals(x), list(body(x))))
    if (is.expression(x)) {
	n <- length(x)
	l <- vector("list", n)
	i <- 0
	for (sub in x) l[[i <- i + 1]] <- sub
	return(l)

    }
    .Internal(as.vector(x, "list"))
}

That's a bit strange, as an expression is internally a list, and it loses the names on the expression. I intend to make as.list(x) return x unchanged if x is a list (not a pairlist), and to coerce expressions internally.

After that I will think about making as.vector and lapply make fewer copies.

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Tue Jul 11 22:23:41 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 11 Jul 2006 - 20:28:19 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.