Re: [R] tapply huge speed difference if X has names

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Tue 09 Aug 2005 - 05:36:17 EST

Please use a current version of R!

This was fixed long ago, and you will find it in the NEWS file:

         split() now handles vectors with names internally and so is
         almost as fast as on vectors without names (and maybe 100x
         faster than before).


On Mon, 8 Aug 2005, Matthew Dowle wrote:

>
> Hi all,
>
> Apologies if this has been raised before ... R's tapply is very fast, but if
> X has names in this example, there seems to be a huge slow down: under 1
> second compared to 151 seconds. The following timings are repeatable and
> are timed properly on a single user machine :
>
>> X = 1:100000
>> names(X) = X
>> system.time(fast<<-tapply(as.vector(X), rep(1:10000,each=10), mean)) #
> as.vector() to drop the names
> [1] 0.36 0.00 0.35 0.00 0.00
>> system.time(slow<<-tapply(X, rep(1:10000,each=10), mean))
> [1] 149.95 1.83 151.79 0.00 0.00
>> head(fast)
> 1 2 3 4 5 6
> 5.5 15.5 25.5 35.5 45.5 55.5
>> head(slow)
> 1 2 3 4 5 6
> 5.5 15.5 25.5 35.5 45.5 55.5
>> identical(fast,slow)
> [1] TRUE
>>
>
> Looking inside tapply, which then calls split, it seems there is an
> is.null(names(x)) which prevents R's internal fast version from being
> called. Why is that there? Could it be removed? I often do something like
> tapply(mat[,"colname"],...) where mat has rownames. Therefore the rownames
> of mat become the names of the vector mat[,"colname"], and this seems to
> slow down tapply a lot. Perhaps other functions which call split also suffer
> this problem?
>
>> split.default
> function (x, f)
> {
> if (is.list(f))
> f <- interaction(f)
> f <- factor(f)
> if (is.null(attr(x, "class")) && is.null(names(x)))
> return(.Internal(split(x, f)))
> lf <- levels(f)
> y <- vector("list", length(lf))
> names(y) <- lf
> for (k in lf) y[[k]] <- x[f %in% k]
> y
> }
> <environment: namespace:base>
>>
>
>> version
> _
> platform x86_64-redhat-linux-gnu
> arch x86_64
> os linux-gnu
> system x86_64, linux-gnu
> status
> major 2
> minor 0.1
> year 2004
> month 11
> day 15
> language R
>>
>
>
> Thanks and regards,
> Matthew
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Tue Aug 09 05:41:41 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 15:10:05 EST