Re: [Rd] Any interest in "merge" and "by" implementations specifically for so

From: Kevin B. Hendricks <kevin.hendricks_at_sympatico.ca>
Date: Mon 31 Jul 2006 - 13:41:53 GMT

Hi Tom,

> Now, try sorting and using a loop:
>
>> idx <- order(i)
>> xs <- x[idx]
>> is <- i[idx]
>> res <- array(NA, 1e6)
>> idx <- which(diff(is) > 0)
>> startidx <- c(1, idx+1)
>> endidx <- c(idx, length(xs))
>> f1 <- function(x, startidx, endidx, FUN = sum) {
> + for (j in 1:length(res)) {
> + res[j] <- FUN(x[startidx[j]:endidx[j]])
> + }
> + res
> + }
>> unix.time(res1 <- f1(xs, startidx, endidx))
> [1] 6.86 0.00 7.04 NA NA

I wonder how much time the sorting, reordering and creation os startidx and endidx would add to this time?

Either way, your code can nicely be used to quickly create the small integer factors I would need if the igroup functions get integrated. Thanks!

> For the case of sum (or averages), you can vectorize this using
> cumsum as
> follows. This won't work for median or max.
>
>> f2 <- function(x, startidx, endidx) {
> + cum <- cumsum(x)
> + res <- cum[endidx]
> + res[2:length(res)] <- res[2:length(res)] - cum[endidx[1:(length
> (res) -
> 1)]]
> + res
> + }
>> unix.time(res2 <- f2(xs, startidx, endidx))
> [1] 0.20 0.00 0.21 NA NA

Yes that is a quite fast way to handle "sums".

> You can also use Luke Tierney's byte compiler
> (http://www.stat.uiowa.edu/~luke/R/compiler/) to speed up the loop for
> functions where you can't vectorize:
>
>> library(compiler)
>> f3 <- cmpfun(f1)
> Note: local functions used: FUN
>> unix.time(res3 <- f3(xs, startidx, endidx))
> [1] 3.84 0.00 3.91 NA NA

That looks interesting. Does it only work for specific operating systems and processors? I will give it a try.

Thanks,

Kevin



R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon Jul 31 23:47:20 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 31 Jul 2006 - 16:27:23 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.