Re: [R] choosing best 'match' for given factor

From: Bert Gunter <gunter.berton_at_gene.com>
Date: Thu, 31 Mar 2011 09:21:04 -0700

Folks:

I think the following may be somewhat faster, as it avoids sorting:

bmat <- function(mx,vec)
{
  nm <- colnames(mx)
  ivec <- match(vec,nm)
  sapply(ivec,function(k){

   if(k==1)NA  else {
    lookat <- setdiff(seq_len(k-1),ivec) ## only those to left and not in search vector ##

    nm[lookat[which.max(mx[lookat,k] )]]
   }
  }
 )

}

On Thu, Mar 31, 2011 at 8:30 AM, Nick Sabbe <nick.sabbe_at_ugent.be> wrote:

>
> Hi Murali.
> I haven't compared, but this is what I would do:
>
> bestMatch<-function(searchVector, matchMat)
> {
>        searchRow<-unique(sort(match(searchVector, colnames(matchMat)))) #if
> you're sure, you could drop unique
>        cat("Original row indices:")
>        print(searchRow)
>        matchMat<-matchMat[, -searchRow, drop=FALSE] #avoid duplicates
> altogether
>        cat("Corrected Matrix:\n")
>        print(matchMat)
>        correctedRows<-searchRow - seq_along(searchRow) + 1 #works because
> of the sort above
>        cat("Corrected row indices:")
>        print(correctedRows)
>        sapply(correctedRows, function(cr){
>                        lookWhere<-matchMat[cr, seq(cr-1)]
>                        cat("Will now look into:\n")
>                        print(lookWhere)
>                        cc<-which.max(lookWhere)
>                        cat("Max at position", cc, "\n")
>                        colnames(matchMat)[cc]
>                })
> }
> I don't think there's that much difference. Depending on specific sizes, it
> may be more or less costly to first shrink the search matrix like I do. And
> similarly depending, I may be better still if you remove the rows that
> you're not interested in as well (some more but similar index trickery
> required then.
>
> HTH,
>
>
> Nick Sabbe
> --
> ping: nick.sabbe_at_ugent.be
> link: http://biomath.ugent.be
> wink: A1.056, Coupure Links 653, 9000 Gent
> ring: 09/264.59.36
>
> -- Do Not Disapprove
>
>
>
>
>
> -----Original Message-----
> From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org] On
> Behalf Of Murali.Menon_at_avivainvestors.com
> Sent: donderdag 31 maart 2011 16:46
> To: r-help_at_r-project.org
> Subject: [R] choosing best 'match' for given factor
>
> Folks,
>
> I have a 'matching' matrix between variables A, X, L, O:
>
> > a <- structure(c(1, 0.41, 0.58, 0.75, 0.41, 1, 0.6, 0.86, 0.58,
> 0.6, 1, 0.83, 0.75, 0.86, 0.83, 1), .Dim = c(4L, 4L), .Dimnames = list(
>    c("A", "X", "L", "O"), c("A", "X", "L", "O")))
>
> > a
>      A     X     L     O
> A  1.00  0.41  0.58  0.75
> X  0.41  1.00  0.60  0.86
> L  0.58  0.75  1.00  0.83
> O  0.60  0.86  0.83  1.00
>
> And I have a search vector of variables
>
> > v <- c("X", "O")
>
> I want to write a function bestMatch(searchvector, matchMat) such that for
> each variable in searchvector, I get the variable that it has the highest
> match to - but searching only among variables to the left of it in the
> 'matching' matrix, and not matching with any variable in searchvector
> itself.
>
> So in the above example, although "X" has the highest match (0.86) with "O",
> I can't choose "O" as it's to the right of X (and also because "O" is in the
> searchvector v already); I'll have to choose "A".
>
> For "O", I will choose "L", the variable it's best matched with - as it
> can't match "X" already in the search vector.
>
> My function bestMatch(v, a) will then return c("A", "L")
>
> My matrix a is quite large, and I have a long list of search vectors v, so I
> need an efficient method.
>
> I wrote this:
>
> bestMatch <- function(searchvector,  matchMat) {
>        sapply(searchvector, function(cc) {
>                             y <- matchMat[!(rownames(matchMat) %in%
> searchvector) & (index(rownames(matchMat)) < match(cc, rownames(matchMat))),
> cc, drop = FALSE];
>                             rownames(y)[which.max(y)]
>        })
> }
>
> Any advice?
>
> Thanks,
>
> Murali
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
"Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions."

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 31 Mar 2011 - 16:29:19 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 31 Mar 2011 - 16:30:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive