# Re: [R] fast way to compare two matrices of combinations

From: Charles C. Berry <cberry_at_tajo.ucsd.edu>
Date: Thu, 13 Mar 2008 11:10:56 -0700

> I have a list (length 750), each element containing a vector of unique
> strings (unique gene ids), with length up to ~40 (median 15). I want to
> compile a matrix of all possible triplets and their frequency within
> gene elements. Using combn and a lot of looping, I am accomplishing this
> but it is VERY slow.
>
> I've tried to figure out a way to vectorize this, using "match" and
> "%in%", but can't get my mind around it.
>
> Below is my code. sig.tf.pairs is the list. Suggestions?

Does this really do what you wanted?

if (length(intersect(triplets[,m], all.triplets[,k] == M))){

If so, then why does the first line below never produce an error?

count.vec <- count.vec[,-redundant.vec]

is.null(dim(count.vec)) ## TRUE

You are basically tabulating. Use the functions that are built for that.

It looks like what you want is along these lines:

``` 	tab.combns <- function(x) apply( combn( sort(x), M ),2,
function(x) paste(x,collapse=''))

tab.all <- table( unlist( lapply(sig.tf.pairs,tab.combns) ) )

```

Chuck
>
> Mark
>
>
> ############################################################
> M <- 3 # 3 for triplets, etc.
> ##########################################################
> # count all triplets
> all.triplets <- NULL
> all.count.vec <- NULL
> for (i in 1:length(sig.tf.pairs)){
> if (length(sig.tf.pairs[[i]] >= M)){
> triplets <- combn(sig.tf.pairs[[i]], M, simplify = TRUE)
> for (j in 1:ncol(triplets)){
> o <- order(triplets[,j])
> triplets[,j] <- triplets[o,j]
> count.vec <- rep(1, ncol(triplets))
> }
> if (is.null(all.count.vec)){
> all.count.vec <- count.vec
> all.triplets <- triplets
> } else {
> redundant.vec <- NULL
> for (k in 1:ncol(all.triplets)){
> for (m in 1:ncol(triplets)){
> if (length(intersect(triplets[,m], all.triplets[,k] == M))){
> all.count.vec[k] <- all.count.vec[k] + 1
> redundant.vec <- c(redundant.vec, m)
> }
> }
> }
> if(!is.null(redundant.vec)){
> triplets <- triplets[,-redundant.vec]
> count.vec <- count.vec[,-redundant.vec]
> }
> all.triplets <- cbind(all.triplets, triplets)
> all.count.vec <- c(all.count.vec, count.vec)
> }
> }
> }
> ###################################
>
> --
>
> Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
> Indiana University School of Medicine
>
> 15032 Hunter Court, Westfield, IN 46074
>
> (317) 490-5129 Work, & Mobile & VoiceMail
> (317) 204-4202 Home (no voice mail please)
>
> mwkimpel<at>gmail<dot>com
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> and provide commented, minimal, self-contained, reproducible code.
>

```Charles C. Berry                            (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry_at_tajo.ucsd.edu	            UC San Diego
```
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 13 Mar 2008 - 18:15:13 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 14 Mar 2008 - 01:30:21 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.