Re: [R] fast way to compare two matrices of combinations

From: Charles C. Berry <cberry_at_tajo.ucsd.edu>
Date: Thu, 13 Mar 2008 11:10:56 -0700

On Thu, 13 Mar 2008, Mark W Kimpel wrote:

> I have a list (length 750), each element containing a vector of unique
> strings (unique gene ids), with length up to ~40 (median 15). I want to
> compile a matrix of all possible triplets and their frequency within
> gene elements. Using combn and a lot of looping, I am accomplishing this
> but it is VERY slow.
>
> I've tried to figure out a way to vectorize this, using "match" and
> "%in%", but can't get my mind around it.
>
> Below is my code. sig.tf.pairs is the list. Suggestions?

First, be sure that your code does what you really intend for it to do.

Does this really do what you wanted?

       if (length(intersect(triplets[,m], all.triplets[,k] == M))){

If so, then why does the first line below never produce an error?

          count.vec <- count.vec[,-redundant.vec]

         is.null(dim(count.vec)) ## TRUE

You are basically tabulating. Use the functions that are built for that.

It looks like what you want is along these lines:

 	tab.combns <- function(x) apply( combn( sort(x), M ),2,
         	                        function(x) paste(x,collapse=''))

 	tab.all <- table( unlist( lapply(sig.tf.pairs,tab.combns) ) )

Chuck
>
> Mark
>
>
> ############################################################
> M <- 3 # 3 for triplets, etc.
> ##########################################################
> # count all triplets
> all.triplets <- NULL
> all.count.vec <- NULL
> for (i in 1:length(sig.tf.pairs)){
> if (length(sig.tf.pairs[[i]] >= M)){
> triplets <- combn(sig.tf.pairs[[i]], M, simplify = TRUE)
> for (j in 1:ncol(triplets)){
> o <- order(triplets[,j])
> triplets[,j] <- triplets[o,j]
> count.vec <- rep(1, ncol(triplets))
> }
> if (is.null(all.count.vec)){
> all.count.vec <- count.vec
> all.triplets <- triplets
> } else {
> redundant.vec <- NULL
> for (k in 1:ncol(all.triplets)){
> for (m in 1:ncol(triplets)){
> if (length(intersect(triplets[,m], all.triplets[,k] == M))){
> all.count.vec[k] <- all.count.vec[k] + 1
> redundant.vec <- c(redundant.vec, m)
> }
> }
> }
> if(!is.null(redundant.vec)){
> triplets <- triplets[,-redundant.vec]
> count.vec <- count.vec[,-redundant.vec]
> }
> all.triplets <- cbind(all.triplets, triplets)
> all.count.vec <- c(all.count.vec, count.vec)
> }
> }
> }
> ###################################
>
> --
>
> Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
> Indiana University School of Medicine
>
> 15032 Hunter Court, Westfield, IN 46074
>
> (317) 490-5129 Work, & Mobile & VoiceMail
> (317) 204-4202 Home (no voice mail please)
>
> mwkimpel<at>gmail<dot>com
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry_at_tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 13 Mar 2008 - 18:15:13 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 14 Mar 2008 - 01:30:21 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive