Re: [R] On the speed of apply and alternatives?

From: François Pinard <pinard_at_iro.umontreal.ca>
Date: Tue 09 May 2006 - 21:51:00 EST

[Monty B. ]

>I have to handle a large matrix (1000 x 10001) where in the
>last column i have a value that all the preceding values in the same row
>has to be compared to. I have made the following code :

># generate a (1000 x 10001) matrix, testm
># generate statistics matrix 1000 x 4:

>qnt <- c(0.01, 0.05)
>cmp_fun <- function(x)
>{
> LAST <- length(x)
> smpls <- x[1:(LAST-1)]
> real <- x[LAST]

> ret <- vector(length=length(qnt)*2)
> for (i in 1:length(qnt))
> {
> q_i <- quantile(smpls, qnt[i]) # the quantile i
> m_i <- mean(smpls[smpls<q_i ] ) # mean of obs less than q_i
> ret[i] <- ifelse(real < q_i, 1, 0)
> ret[length(qnt)+i] <- ifelse(real < q_i, real - m_i, 0)
> }
> ret
>}
>hcvx <- apply(testm, 1, cmp_fun)

>Can anyone advise as to how I can optimize the runtime of this problem?
>All suggestions are welcome!

You may speed it up a bit, not so much, with the following:

stats.testm <- function (testm, qnt=c(0.01, 0.05)) {

    quants <- apply(testm[, 1:(ncol(testm)-1)], 1, quantile, qnt)

    smpls <- testm[rep(1:nrow(testm), each=length(qnt)), 1:(ncol(testm)-1)]
    reals <- testm[rep(1:nrow(testm), each=length(qnt)), ncol(testm)]
    keeps <- smpls < rep(quants, ncol(smpls))
    means <- rowSums(smpls * keeps) / rowSums(keeps)
    matrix(rbind((reals < quants) + 0,
                 (reals < quants) * (reals - means)),
           length(qnt) * 2)

}

Try it with something like:

gen.testm <- function (n, m) {

    matrix(sample(0:99, n * (m + 1), TRUE), n) }

testm <- gen.testm(100, 100)
stats.testm(testm)

Without checking, I would suspect that quantile is the big consumer. If you could make it without quantile interpolation, maybe some more vectorisation could be possible, but in any case, I do not think you can avoid sorting each row separately, in one way or another (currently done within quantile).

-- 
François Pinard   http://pinard.progiciels-bpi.ca

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Tue May 09 21:55:52 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 09 May 2006 - 22:10:01 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.