[R] fast way to compare two matrices of combinations

From: Mark W Kimpel <mwkimpel_at_gmail.com>
Date: Thu, 13 Mar 2008 12:23:57 -0400


I have a list (length 750), each element containing a vector of unique strings (unique gene ids), with length up to ~40 (median 15). I want to compile a matrix of all possible triplets and their frequency within gene elements. Using combn and a lot of looping, I am accomplishing this but it is VERY slow.

I've tried to figure out a way to vectorize this, using "match" and "%in%", but can't get my mind around it.

Below is my code. sig.tf.pairs is the list. Suggestions?

Mark

############################################################
M <- 3 # 3 for triplets, etc.
##########################################################
# count all triplets
all.triplets <- NULL
all.count.vec <- NULL
for (i in 1:length(sig.tf.pairs)){

   if (length(sig.tf.pairs[[i]] >= M)){

     triplets <- combn(sig.tf.pairs[[i]], M, simplify = TRUE)
     for (j in 1:ncol(triplets)){
       o <- order(triplets[,j])
       triplets[,j] <- triplets[o,j]
       count.vec <- rep(1, ncol(triplets))

}
if (is.null(all.count.vec)){ all.count.vec <- count.vec all.triplets <- triplets
} else {
redundant.vec <- NULL for (k in 1:ncol(all.triplets)){ for (m in 1:ncol(triplets)){ if (length(intersect(triplets[,m], all.triplets[,k] == M))){ all.count.vec[k] <- all.count.vec[k] + 1 redundant.vec <- c(redundant.vec, m) } } } if(!is.null(redundant.vec)){ triplets <- triplets[,-redundant.vec] count.vec <- count.vec[,-redundant.vec] } all.triplets <- cbind(all.triplets, triplets) all.count.vec <- c(all.count.vec, count.vec)
}

   }
}
###################################

-- 

Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail
(317) 204-4202 Home (no voice mail please)

mwkimpel<at>gmail<dot>com

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 13 Mar 2008 - 16:27:28 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 13 Mar 2008 - 18:30:21 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive