Re: [R] Speeding indexing and sub-sectioning of 3d array

From: Patrick Burns <pburns_at_pburns.seanet.com>
Date: Thu 10 Aug 2006 - 06:41:49 EST

First off, I hope that the function you list is just an example since it only returns what the last iteration does -- obviously the same answer can be arrived at much quicker.

The main principal in speeding up loops is to do as little inside the loops as possible. 'fjj1' is essentially the same as the listed function, but with one slight cleanup.

fjj1 <-
function(x, radius=3)
{

        dx <- dim(x)
        dx1 <- dx[1]
        dx2 <- dx[2]
        dx3 <- dx[3]
        for(i in (radius + 1):(dx1 - radius - 1)) {
                for(j in (radius + 1):(dx2 - radius - 1)) {
                        for(k in (radius + 1):(dx3 - radius -1)) {
                                ans <- mean(x[(i-radius):(i+radius),
                                        (j-radius):(j+radius), 
(k-radius):(k+radius)])
                        }
                }
        }
        ans

}

The time to run fjj1(jj, 3) on my machine where jj is a 245 by 175 by 150 array was 1222 seconds.

'fjj2' substantially reduces the number of sequences created. It took 975 seconds.

fjj2 <-
function(x, radius=3)
{

        dx <- dim(x)
        dx1 <- dx[1]
        dx2 <- dx[2]
        dx3 <- dx[3]
        rseq <- -radius:radius
        for(i in (radius + 1):(dx1 - radius - 1)) {
                for(j in (radius + 1):(dx2 - radius - 1)) {
                        for(k in (radius + 1):(dx3 - radius -1)) {
                                ans <- mean(x[i + rseq, j + rseq, k + rseq])
                        }
                }
        }
        ans

}

'fjj3' reduces some of the subscripting (but possibly at the expense of using more memory -- I'm not sure if it does or not). It took 936 seconds.

fjj3 <-
function(x, radius=3)
{

        dx <- dim(x)
        dx1 <- dx[1]
        dx2 <- dx[2]
        dx3 <- dx[3]
        rseq <- -radius:radius
        for(i in (radius + 1):(dx1 - radius - 1)) {
                A <- x[i + rseq, , , drop=FALSE]
                for(j in (radius + 1):(dx2 - radius - 1)) {
                        B <- A[, j + rseq, , drop=FALSE]
                        for(k in (radius + 1):(dx3 - radius -1)) {
                                ans <- mean(B[ , , k + rseq])
                        }
                }
        }
        ans

}

'fjj4' reverses the order of the loops. Because of the way that arrays are stored, it makes sense that subscripting a sequence in the first dimension would be faster than subscripting subsequent dimensions. This does seem to be the case. 'fjj4' took 839 seconds.

fjj4 <-
function(x, radius=3)
{

        dx <- dim(x)
        dx1 <- dx[1]
        dx2 <- dx[2]
        dx3 <- dx[3]
        rseq <- -radius:radius
        for(i in (radius + 1):(dx3 - radius - 1)) {
                A <- x[, ,i + rseq, drop=FALSE]
                for(j in (radius + 1):(dx2 - radius - 1)) {
                        B <- A[, j + rseq, , drop=FALSE]
                        for(k in (radius + 1):(dx1 - radius -1)) {
                                ans <- mean(B[k + rseq, , ])
                        }
                }
        }
        ans

}

Another change that would make a marginal difference would be to generate the sequences controlling the inner loops once at the outset.

If the computation at the heart of the function really is a mean or something similar, then it is possible that there will be tricks to update that value more efficiently.

Finally, if this will be used enough that the speed is an issue, then rewriting it in C would be a good approach.

Patrick Burns
patrick@burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")

Swidan, Firas wrote:

>Hi,
>
>I am having a problem with a very slow indexing and sub-sectioning of a 3d
>array:
>
>  
>
>>dim(arr)
>>    
>>
>[1] 245 175 150
>
>For each point in the array, I am trying to calculate the mean of the values
>in its surrounding:
>
>
>mean( arr[ (i - radius):(i + radius),
>                                (j - radius):(j + radius),
>                                (k - radius):(k + radius)] )
>
>Putting that code in 3 for loops
>
>calculateKMedian <- function( arr, radius){
>
>  for( i in (radius + 1):(dim(arr)[1] - radius - 1) ){
>    for( j in (radius + 1):(dim(arr)[2] - radius - 1) )
>      for( k in (radius + 1):(dim(arr)[3] - radius - 1) ){
>
>
>        mediansArr <- mean( arr[ (i - radius):(i + radius),
>                                (j - radius):(j + radius),
>                                (k - radius):(k + radius)] )
>
>      }
>  }
>  return(mediansArr)
>}
>
>Results in a very slow run:
>
>  
>
>>system.time( calculateKMedian( a, 3))
>>    
>>
>[1] 423.468   0.096 423.631   0.000   0.000
>
>If I replace 
>
>        mediansArr <- mean( arr[ (i - radius):(i + radius),
>                                (j - radius):(j + radius),
>                                (k - radius):(k + radius)] )
>
>With an access to the (I,j,k) cell's value
>
> mediansArr <- arr[i,j,k]
>
>The running time decreases to
>
>  
>
>>system.time( calculateKMedian( a, 3))
>>    
>>
>[1] 14.821  0.005 14.829  0.000  0.000
>
>
>
>But 14 seconds are still pretty expensive for just scanning the array.
>
>Is there anything I can do to speed the indexing and the sub-sectioning of
>the 3d array in this case?
>
>Thanks for the help,
>Firas.
>
>______________________________________________
>R-help@stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>
>  
>

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu Aug 10 06:45:01 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 11 Aug 2006 - 06:22:47 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.