Re: [R] fast way to compare two matrices of combinations

From: Erik Iverson <iverson_at_biostat.wisc.edu>
Date: Thu, 13 Mar 2008 11:27:58 -0500

Hello Mark -

It may help if you provide a (small) set of example input and what you'd like as your output.

Best,
Erik Iverson

Mark W Kimpel wrote:
> I have a list (length 750), each element containing a vector of unique
> strings (unique gene ids), with length up to ~40 (median 15). I want to
> compile a matrix of all possible triplets and their frequency within
> gene elements. Using combn and a lot of looping, I am accomplishing this
> but it is VERY slow.
>
> I've tried to figure out a way to vectorize this, using "match" and
> "%in%", but can't get my mind around it.
>
> Below is my code. sig.tf.pairs is the list. Suggestions?
>
> Mark
>
>
> ############################################################
> M <- 3 # 3 for triplets, etc.
> ##########################################################
> # count all triplets
> all.triplets <- NULL
> all.count.vec <- NULL
> for (i in 1:length(sig.tf.pairs)){
> if (length(sig.tf.pairs[[i]] >= M)){
> triplets <- combn(sig.tf.pairs[[i]], M, simplify = TRUE)
> for (j in 1:ncol(triplets)){
> o <- order(triplets[,j])
> triplets[,j] <- triplets[o,j]
> count.vec <- rep(1, ncol(triplets))
> }
> if (is.null(all.count.vec)){
> all.count.vec <- count.vec
> all.triplets <- triplets
> } else {
> redundant.vec <- NULL
> for (k in 1:ncol(all.triplets)){
> for (m in 1:ncol(triplets)){
> if (length(intersect(triplets[,m], all.triplets[,k] == M))){
> all.count.vec[k] <- all.count.vec[k] + 1
> redundant.vec <- c(redundant.vec, m)
> }
> }
> }
> if(!is.null(redundant.vec)){
> triplets <- triplets[,-redundant.vec]
> count.vec <- count.vec[,-redundant.vec]
> }
> all.triplets <- cbind(all.triplets, triplets)
> all.count.vec <- c(all.count.vec, count.vec)
> }
> }
> }
> ###################################
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 13 Mar 2008 - 16:30:44 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 13 Mar 2008 - 17:30:21 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive