[R] tapply huge speed difference if X has names

From: Matthew Dowle <mdowle_at_concordiafunds.com>
Date: Tue 09 Aug 2005 - 04:15:19 EST

Hi all,

Apologies if this has been raised before ... R's tapply is very fast, but if X has names in this example, there seems to be a huge slow down: under 1 second compared to 151 seconds. The following timings are repeatable and are timed properly on a single user machine :

> X = 1:100000
> names(X) = X
> system.time(fast<<-tapply(as.vector(X), rep(1:10000,each=10), mean)) #
as.vector() to drop the names
[1] 0.36 0.00 0.35 0.00 0.00
> system.time(slow<<-tapply(X, rep(1:10000,each=10), mean))
[1] 149.95 1.83 151.79 0.00 0.00
> head(fast)

   1 2 3 4 5 6
 5.5 15.5 25.5 35.5 45.5 55.5
> head(slow)

   1 2 3 4 5 6
 5.5 15.5 25.5 35.5 45.5 55.5
> identical(fast,slow)

[1] TRUE
>

Looking inside tapply, which then calls split, it seems there is an is.null(names(x)) which prevents R's internal fast version from being called. Why is that there? Could it be removed? I often do something like tapply(mat[,"colname"],...) where mat has rownames. Therefore the rownames of mat become the names of the vector mat[,"colname"], and this seems to slow down tapply a lot. Perhaps other functions which call split also suffer this problem?

> split.default

function (x, f)
{

    if (is.list(f))

        f <- interaction(f)
    f <- factor(f)
    if (is.null(attr(x, "class")) && is.null(names(x)))

        return(.Internal(split(x, f)))
    lf <- levels(f)
    y <- vector("list", length(lf))
    names(y) <- lf
    for (k in lf) y[[k]] <- x[f %in% k]
    y
}
<environment: namespace:base>
>

> version

         _
platform x86_64-redhat-linux-gnu

arch     x86_64                 
os       linux-gnu              
system   x86_64, linux-gnu      
status                          
major    2                      
minor    0.1                    
year     2004                   
month    11                     
day      15                     
language R                      

>

Thanks and regards,
Matthew

        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Aug 09 04:13:55 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:39:45 EST