Re: [Rd] HOW TO AVOID LOOPS

From: Bill Dunlap <bill_at_insightful.com>
Date: Mon, 14 Apr 2008 16:41:38 -0700 (PDT)

On Mon, 14 Apr 2008, Stephen Milborrow wrote:

> > Le sam. 12 avr. à 12:47, carlos martinez a écrit :
> > Looking for a simple, effective a minimum execution time solution.
> >
> > For a vector as:
> >
> > c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1)
> >
> > To transform it to the following vector without using any loops:
> >
> > (0,0,1,0,1,2,3,0,0,1,2,0,1,0,1,2,3,4,5,6)
>
> Here is a fast solution using the Ra just-in-time compiler
> www.milbo.users.sonic.net/ra.
>
> jit(1)
> if (length(x) > 1)
> for (i in 2:length(x))
> if (x[i])
> x[i] <- x[i-1] + 1
>
> The times in seconds for various solutions mailed to r-devel are listed
> below. There is some variation between runs and with the contents of x. The
> times shown are for
>
> set.seed(1066); x <- as.double(runif(1e6) > .5)
>
> This was tested on a WinXP 3 GHz Pentium D with Ra 1.0.7 (based on R 2.6.2).
> The code to generate these results is attached.
>
> vin 24
> greg 11
> had 3.9
> dan 1.4
> dan2 1.4
> jit 0.25 # code is shown above, 7 secs with standard R 2.6.2>

Stephen's solution is certainly easy to read and write.

Another solution, if I understand the scope of the problem, is

   f7 <- function(x){ tmp<-cumsum(x);tmp-cummax((!x)*tmp)}

I made a script to run all the functions I noticed (except for the library(inline) one) on various 0-1 vectors of length one million and got the following timings on R 2.6.2 running on my Windows laptop (Lenovo T61, Core 2 Duo at 2 GHz). "error thrown" means the function call died and "incorrect" means it returned the wrong answer.

   > source("z:/dumpdata.R")
   Timing stopped at: 0.05 0 0.05 NA NA
   Error in dots[[1L]][[1L]] : subscript out of bounds    Timing stopped at: 0.14 0.06 0.21 NA NA    Error in x[x == 1] <- unlist(lapply(ends - starts, function(n) 1:n)) :

     incompatible types (from NULL to double) in subassignment type fix
      all.ones     all.zeros    few.long  many.short
   f1    7.02         7.03         7.04      7.03
   f2    0.13         0.13         0.13      2.52
   f3 error thrown   35.47      incorrect incorrect
   f4    0.19      error thrown    0.21      1.20
   f5    0.28         0.09         0.29      0.18
   f6    5.40         0.78         5.42      3.14
   f7    0.06         0.05         0.06      0.06

I've attached the script so you can figure out whose function is whose if you care to. The lapply/mapply solution, f3, required that there be 1's at both ends of the input vector. Perhaps I miscopied the code.



Bill Dunlap
Insightful Corporation
bill at insightful dot com

 "All statements in this message represent the opinions of the author and do  not necessarily reflect Insightful Corporation policy or position."


The test script:

len <- 1e6 # length of vectors in tests

funs <- list(
`f1` = function(x)Reduce( function(x,y) x*y + y, x, accumulate=TRUE ),

`f2` = function(x)x * unlist(lapply(rle(x)$lengths, seq_len)),

`f3` = function(x){

    ind <- which(x == 0)
    unlist(lapply(mapply(seq, ind, c(tail(ind, -1) - 1, length(x))),

        function(y) cumsum(x[y]))) },

`f4` = function(x) {

    d <- diff(c(0,x,0))
    starts <- which(d == 1)
    ends <- which(d == -1)
    x[x == 1] <- unlist(lapply(ends - starts, function(n) 1:n))     x },

`f5` = function(x) {

    if (existsFunction("jit")) jit(1) else stop("no jit available")     if (length(x) > 1)

        for (i in 2:length(x))
            if (x[i])
                x[i] <- x[i-1] + 1

    x
    },
`f6` = # same as f5, but not compiled with jit.

    function(x) {

        if (existsFunction("jit")) jit(0)
        if (length(x) > 1)
            for (i in 2:length(x))
                if (x[i])
                    x[i] <- x[i-1] + 1
        x

    },
`f7` =

    function(x) {

        tmp<-cumsum(x)
        tmp-cummax((!x)*tmp)

    }
)

data <- list(

    all.ones = rep(1, len),
    all.zeros = rep(0, len),
    few.long = rep( c(rep(1,len/10-1),0), len=len), # 10 long runs
    many.short = rep(c(1,0), len=len) # len/2 runs of length 1 )
expected <- list(
    all.ones = 1:len,
    all.zeros = rep(0, len),
    few.long = rep( c(1:(len/10-1), 0), len=len),
    many.short = rep(c(1,0), len=len)
)

print(noquote(sapply(names(data),

   function(d) sapply(names(funs),

      function(f){
          time<-try(unix.time(gcFirst=TRUE, val<-funs[[f]](data[[d]])))
          if (is(time, "try-error"))
              "error thrown"
          else if (!isTRUE(report <- all.equal(val, expected[[d]])))
              "incorrect"
          else
              sprintf("%7.2f", unname(time[1]+time[2]))
      }

   )
)))



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon 14 Apr 2008 - 23:44:11 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 15 Apr 2008 - 01:31:10 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive