[R] Speeding up resampling of rows from a large matrix

From: Juan Pablo Lewinger <lewinger_at_usc.edu>
Date: Thu, 24 May 2007 23:04:28 -0700


I'm trying to:

Resample with replacement pairs of distinct rows from a 120 x 65,000 matrix H of 0's and 1's. For each resampled pair sum the resulting 2 x 65,000 matrix by column:

     0 1 0 1 ...
+

     0 0 1 1 ...


For each column accumulate the number of 0's, 1's and 2's over the resamples to obtain a 3 x 65,000 matrix G.

For those interested in the background, H is a matrix of haplotypes, each pair of haplotypes forms a genotype, and each column corresponds to a SNP. I'm using resampling to compute the null distribution of the maximum over correlated SNPs of a simple statistic.

The code:

#-------------------------------------------------------------------------------
nSNPs <- 1000
H <- matrix(sample(0:1, 120*nSNPs , replace=T), nrow=120) G <- matrix(0, nrow=3, ncol=nSNPs)
# Keep in mind that the real H is 120 x 65000

nResamples <- 3000
pair <- replicate(nResamples, sample(1:120, 2))

gen <- function(x){g <- sum(x); c(g==0, g==1, g==2)}

for (i in 1:nResamples){

    G <- G + apply(H[pair[,i],], 2, gen) }

#-------------------------------------------------------------------------------
The problem is that the loop takes about 80 mins to complete and I need to repeat the whole thing 10,000 times, which would then take over a year and a half!

Is there a way to speed this up so that the full 10,000 iterations take a reasonable amount of time (say a week)?

My machine has an Intel Xeon 3.40GHz CPU with 1GB of RAM

> sessionInfo()

R version 2.5.0 (2007-04-23)
i386-pc-mingw32

I would greatly appreciate any help.

Juan Pablo Lewinger
Department of Preventive Medicine
Keck School of Medicine
University of Southern California



R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 25 May 2007 - 06:16:32 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 25 May 2007 - 09:31:41 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.