Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

From: David Winsemius <dwinsemius_at_comcast.net>
Date: Wed, 26 Nov 2008 10:08:56 -0500

He might try rcorr from Hmisc instead. Using your test suite, it gives about a 20% improvement on my MacPro:

 > m1 <- matrix(rnorm(10000), ncol=100)
 > m2 <- matrix(rnorm(10000), ncol=100)
 > Rprof('/tempxx.txt')
 > system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1,  
function(y) { rcorr(x,y)$P }) }))

    user system elapsed
   4.221 0.049 4.289

 > m1 <- matrix(rnorm(10000), ncol=100)
 > m2 <- matrix(rnorm(10000), ncol=100)
 > Rprof('/tempxx.txt')
 > system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1,  
function(y) { cor.test(x,y)$p.value }) }))

    user system elapsed
   5.328 0.038 5.355

I'm not a smart enough programmer to figure out whether there might be an even more efficient method that takes advantage rcorr's implicit "looping" through a set of columns to produce an all combinations return.

-- 
David Winsemius, MD
Heritage Labs


On Nov 26, 2008, at 9:14 AM, jim holtman wrote:


> Your time is being taken up in cor.test because you are calling it
> 100,000 times. So grin and bear it with the amount of work you are
> asking it to do.
>
> Here I am only calling it 100 time:
>
>> m1 <- matrix(rnorm(10000), ncol=100)
>> m2 <- matrix(rnorm(10000), ncol=100)
>> Rprof('/tempxx.txt')
>> system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1,
>> function(y) { cor.test(x,y)$p.value }) }))
> user system elapsed
> 8.86 0.00 8.89
>>
>
> so my guess is that calling it 100,000 times will take: 100,000 *
> 0.0886 seconds or about 3 hours.
>
> If you run Rprof, you will see if is spending most of its time there:
>
> 0 8.8 root
> 1. 8.8 apply
> 2. . 8.8 FUN
> 3. . . 8.8 apply
> 4. . . . 8.7 FUN
> 5. . . . . 8.6 cor.test
> 6. . . . . . 8.4 cor.test.default
> 7. . . . . . . 2.4 match.arg
> 8. . . . . . . . 1.7 eval
> 9. . . . . . . . . 1.4 deparse
> 10. . . . . . . . . . 0.6 .deparseOpts
> 11. . . . . . . . . . . 0.2 pmatch
> 11. . . . . . . . . . . 0.1 sum
> 10. . . . . . . . . . 0.5 %in%
> 11. . . . . . . . . . . 0.3 match
> 12. . . . . . . . . . . . 0.3 is.factor
> 13. . . . . . . . . . . . . 0.3 inherits
> 8. . . . . . . . 0.2 formals
> 9. . . . . . . . . 0.2 sys.function
> 7. . . . . . . 2.1 cor
> 8. . . . . . . . 1.1 match.arg
> 9. . . . . . . . . 0.7 eval
> 10. . . . . . . . . . 0.6 deparse
> 11. . . . . . . . . . . 0.3 .deparseOpts
> 12. . . . . . . . . . . . 0.1 pmatch
> 11. . . . . . . . . . . 0.2 %in%
> 12. . . . . . . . . . . . 0.2 match
> 13. . . . . . . . . . . . . 0.1 is.factor
> 14. . . . . . . . . . . . . . 0.1 inherits
> 9. . . . . . . . . 0.1 formals
> 8. . . . . . . . 0.5 stopifnot
> 9. . . . . . . . . 0.2 match.call
> 8. . . . . . . . 0.1 pmatch
> 8. . . . . . . . 0.1 is.data.frame
> 9. . . . . . . . . 0.1 inherits
> 7. . . . . . . 1.5 paste
> 8. . . . . . . . 1.4 deparse
> 9. . . . . . . . . 0.6 .deparseOpts
> 10. . . . . . . . . . 0.3 pmatch
> 10. . . . . . . . . . 0.1 any
> 9. . . . . . . . . 0.6 %in%
> 10. . . . . . . . . . 0.6 match
> 11. . . . . . . . . . . 0.5 is.factor
> 12. . . . . . . . . . . . 0.4 inherits
> 13. . . . . . . . . . . . . 0.2 mode
> 7. . . . . . . 0.4 switch
> 8. . . . . . . . 0.1 qnorm
> 7. . . . . . . 0.2 pt
> 5. . . . . 0.1 $
>
> On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan <daren76_at_hotmail.com>
> wrote:
>>
>> My two matrices are roughly the sizes of m1 and m2. I tried using
>> two apply and cor.test to compute the correlation p.values. More
>> than an hour, and the codes are still running. Please help to make
>> it more efficient.
>>
>> m1 <- matrix(rnorm(100000), ncol=100)
>> m2 <- matrix(rnorm(10000000), ncol=100)
>>
>> cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y)
>> { cor.test(x,y)$p.value }) })
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Received on Wed 26 Nov 2008 - 15:11:03 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 26 Nov 2008 - 15:30:27 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive