Re: [Rd] Suggestion: 20% speed up of which() with two-character mod

From: Henrik Bengtsson <hb_at_stat.berkeley.edu>
Date: Mon, 04 Aug 2008 21:14:12 -0700

Hi,

I just want to do a follow up this very simple fix/correction/speedup/cleanup of the base::which() function. Here is a diff:

diff src/library/base/R/which.R which.R
21c21
< wh <- seq_along(x)[ll <- x & !is.na(x)]

---

> wh <- seq_along(x)[x & !is.na(x)]
25c25 < names(wh) <- names(x)[ll] ---
> names(wh) <- names(x)[wh]
FYI, the 'll' variable is not used elsewhere. I've been going through this modifications several times and I cannot see any side effects. Could someone of R core please commit this? BTW, when one report diff:s, do you prefer to get it with or without context information, e.g. -C 3? /Henrik On Fri, Jul 11, 2008 at 8:57 AM, Charles C. Berry <cberry_at_tajo.ucsd.edu> wrote:
> On Thu, 10 Jul 2008, Henrik Bengtsson wrote:
>
>> Hi,
>>
>> by replacing 'll' with 'wh' in the source code for base::which() one
>> gets ~20% speed up for *named logical vectors*.
>
>
> The amount of speedup depends on how sparse the TRUE values are.
>
> When the proportion of TRUEs gets small the speedup is more than twofold on
> my macbook. For high proportions of TRUE, the speedup is more like the 20%
> you cite.
>
> HTH,
>
> Chuck
>
>>
>> CURRENT CODE:
>>
>> which <- function(x, arr.ind = FALSE)
>> {
>> if(!is.logical(x))
>> stop("argument to 'which' is not logical")
>> wh <- seq_along(x)[ll <- x & !is.na(x)]
>> m <- length(wh)
>> dl <- dim(x)
>> if (is.null(dl) || !arr.ind) {
>> names(wh) <- names(x)[ll]
>> }
>> ...
>> wh;
>> }
>>
>> SUGGESTED CODE: (Remove 'll' and use 'wh')
>>
>> which2 <- function(x, arr.ind = FALSE)
>> {
>> if(!is.logical(x))
>> stop("argument to 'which' is not logical")
>> wh <- seq_along(x)[x & !is.na(x)]
>> m <- length(wh)
>> dl <- dim(x)
>> if (is.null(dl) || !arr.ind) {
>> names(wh) <- names(x)[wh]
>> }
>> ...
>> wh;
>> }
>>
>> That's all.
>>
>> BENCHMARKING:
>>
>> # To measure both in same environment
>> which1 <- base::which;
>> environment(which1) <- globalenv(); # Needed?
>>
>> N <- 1e6;
>> set.seed(0xbeef);
>> x <- sample(c(TRUE, FALSE), size=N, replace=TRUE);
>> names(x) <- seq_along(x);
>> B <- 10;
>> t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); });
>> t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); });
>> stopifnot(identical(idxs1, idxs2));
>> print(t1/t2);
>> # Fair benchmarking
>> t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); });
>> t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); });
>> print(t1/t2);
>> ## user system elapsed
>> ## 1.283186 1.052632 1.250000
>>
>> You get similar results if you put for loop outside the system.time()
>> call (and sum up the timings).
>>
>> Cheers
>>
>> Henrik
>>
>> ______________________________________________
>> R-devel_at_r-project.org mailing list
>>
https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> Charles C. Berry (858) 534-2098
> Dept of Family/Preventive
> Medicine
> E mailto:cberry_at_tajo.ucsd.edu UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
>
>
>
______________________________________________ R-devel_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Tue 05 Aug 2008 - 04:16:50 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 05 Aug 2008 - 13:35:55 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive