Re: [R] Speedups with Ra and jit

From: Luke Tierney <luke_at_stat.uiowa.edu>
Date: Sun, 04 May 2008 13:18:21 -0500 (CDT)

A couple of comments on this and the original thread.

As pointed out by several posters, in a vectorized language like R one can usually create the fastest and cleanest code by using vectorized operations. This applies to R as well as Matlab.

That said, there are at times reasons for using code consisting of a set of tight nested loops, such as

     it may not be obvious how to rewrite the loops using
     vectorization

     the code may match a published algorithm and so verifying its
     correctness and maintaining it in this form may be easier than
     rewriting

     the code may not be fast or pretty but it works and gets the right
     answer

Given these points it would be nice if R's interpreter could do a better job. R's interpreter is currently rather slow on these sorts of computations. For example, a direct translation of the original poster's grw_permute example runs about 5 times as fast in the xlispstat interpreter (similar internals in many ways). There are some things we could do involving relatively little effort to marginally improve R's interpreter, but significant improvements would likely require a fairly significant rewrite (which may be worth considering at some point but probably not soon).

An alternative is to pursue automated tools to transform this sort of code into faster code. This is in part the motivation for creating a byte code compiler for R, and for the Ra/jit project. The current byte code compiler (http://www.stat.uiowa.edu/~luke/R/compiler/) speeds up this computation by about a factor of 3. This compiler does not yet optimize variable lookup or indexing of vectors or matrices; these should be added sometime this summer and should add another factor of 3 or so for this example.

Steve has done some very interesting things in his Ra/jit project. Some of the optimizations done there are either already done in the byre code compiler or have been planned to be added for a while. Others, in particular specialization for basic data types may be best done at run time as jit does, and there may be room for merging these ideas with the static byte code compiler.

The particular specialization approach used in jit means that some code will produce different results or generate errors; i.e. a user who requests jit compilation is implicitly agreeing not to try to do certain things, such as change types or sizes of values stored in variables used in the compilation. In the long run I would prefer either a mechanism where such assumptions are declared explicitly by the user or to arrange for R to automatically switch back to less optimized code when the assumptions of the optimization are violated.

I believe the main aspect of runtime specialization done in jit now that may be hard to match in statically compiled byte code is that a function defined as

     f <- function(x) {
         jit()
         s <- 0
         for (i in seq_along(x)) s <- s + x[i]
         s
     }

will be optimized for integer data when run on integer x and on for real data when run on real x, and in both cases allocation of intermediate results is avoided. How valuable this is in the long run is not yet clear -- it would definitely be very helpful if machine code was being generated that also allowed intermediate values to stay in registers (which I believe is what psyco does), but that is messy and hard to do across many platforms. With fast allocation the benefits of avoiding allocation alone may not be that substantial. For example, the byte compiled xlispstat version of the grw_protect example mentioned above runs about twice as fast at the Ra/jit one without avoiding intermediate allocation. This isn't conclusive of course and it will be interesting to do some more careful tests and see what directions those suggest.

Best,

luke

On Fri, 2 May 2008, Bill.Venables_at_csiro.au wrote:

> The topic of Ra and jit has come up on this list recently
>
> (see http://www.milbo.users.sonic.net/ra/index.html)
>
> so I thought people might be interested in this little demo. For it I
> used my machine, a 3-year old laptop with 2Gb memory running Windows
> XP, and the good old convolution example, the same one as used on the
> web page, (though the code on the web page has a slight glitch in it).
>
> This is using Ra with R-2.7.0.
>
>> conv1 <- function(a, b) {
>> ### with Ra and jit
> require(jit)
> jit(1)
> ab <- numeric(length(a)+length(b)-1)
> for(i in 1:length(a))
> for(j in 1:length(b))
> ab[i+j-1] <- ab[i+j-1] + a[i]*b[j]
> ab
> }
>>
>> conv2 <- function(a, b) {
>> ### with just Ra
> ab <- numeric(length(a)+length(b)-1)
> for(i in 1:length(a))
> for(j in 1:length(b))
> ab[i+j-1] <- ab[i+j-1] + a[i]*b[j]
> ab
> }
>>
>> x <- 1:2000
>> y <- 1:500
>> system.time(tst1 <- conv1(x, y))
> user system elapsed
> 0.53 0.00 0.55
>> system.time(tst2 <- conv2(x, y))
> user system elapsed
> 9.49 0.00 9.56
>> all.equal(tst1, tst2)
> [1] TRUE
>>
>> 9.56/0.55
> [1] 17.38182
>>
>
> However for this example you can achieve speed-ups like that or better
> just using vectorised code intelligently:
>
>> conv3 <- local({
> conv <- function(a, b, na, nb) {
> r <- numeric(na + nb -1)
> ij <- 1:nb
> for(e in a) {
> r[ij] <- r[ij] + e*b
> ij <- ij + 1
> }
> r
> }
>
> function(a, b) {
> na <- length(a)
> nb <- length(b)
> if(na < nb) conv(a, b, na, nb) else
> conv(b, a, nb, na)
> }
> })
>>
>> system.time(tst3 <- conv3(x, y))
> user system elapsed
> 0.11 0.00 0.11
>> all.equal(tst1, tst3)
> [1] TRUE
>
>> 0.55/0.11
> [1] 5
>> 9.56/0.11
> [1] 86.90909
>
> ie. a further 5-fold increase in speed, or about 87 times faster than
> the unassisted naïve code.
>
> I think the lesson here is if you really want to write R code as you
> might C code, then jit can help make it practical in terms of time.
> On the other hand, if you want to write R code using as much of the
> inbuilt operators as you have, then you can possibly still do things
> better.
>
> Of course sometimes you don't have the right inbuilt operators. In
> that case you have a three-way choice: slow R code and wait, faster R
> code speeded up with Ra and jit, or, (the way it probably should be
> done), with dynamically loaded C or Fortran code. Portability
> decreases as you go, of course.
>
>
> Bill Venables
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke_at_stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu


______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Received on Sun 04 May 2008 - 18:42:31 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 04 May 2008 - 19:30:35 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive