Re: [R] choosing best 'match' for given factor

From: Nick Sabbe <nick.sabbe_at_ugent.be>
Date: Thu, 31 Mar 2011 17:30:51 +0200

Hi Murali.
I haven't compared, but this is what I would do:

bestMatch<-function(searchVector, matchMat) {

        searchRow<-unique(sort(match(searchVector, colnames(matchMat)))) #if you're sure, you could drop unique

	cat("Original row indices:")
	print(searchRow)
	matchMat<-matchMat[, -searchRow, drop=FALSE] #avoid duplicates
altogether
	cat("Corrected Matrix:\n")
	print(matchMat)
	correctedRows<-searchRow - seq_along(searchRow) + 1 #works because
of the sort above
	cat("Corrected row indices:")
	print(correctedRows)
	sapply(correctedRows, function(cr){
			lookWhere<-matchMat[cr, seq(cr-1)]
			cat("Will now look into:\n")
			print(lookWhere)
			cc<-which.max(lookWhere)
			cat("Max at position", cc, "\n")
			colnames(matchMat)[cc]
		})

}
I don't think there's that much difference. Depending on specific sizes, it may be more or less costly to first shrink the search matrix like I do. And similarly depending, I may be better still if you remove the rows that you're not interested in as well (some more but similar index trickery required then.

HTH, Nick Sabbe

--
ping: nick.sabbe_at_ugent.be
link: http://biomath.ugent.be
wink: A1.056, Coupure Links 653, 9000 Gent
ring: 09/264.59.36

-- Do Not Disapprove





-----Original Message-----
From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org] On
Behalf Of Murali.Menon_at_avivainvestors.com
Sent: donderdag 31 maart 2011 16:46
To: r-help_at_r-project.org
Subject: [R] choosing best 'match' for given factor

Folks,

I have a 'matching' matrix between variables A, X, L, O:


> a <- structure(c(1, 0.41, 0.58, 0.75, 0.41, 1, 0.6, 0.86, 0.58,
0.6, 1, 0.83, 0.75, 0.86, 0.83, 1), .Dim = c(4L, 4L), .Dimnames = list( c("A", "X", "L", "O"), c("A", "X", "L", "O")))
> a

A X L O A 1.00 0.41 0.58 0.75 X 0.41 1.00 0.60 0.86 L 0.58 0.75 1.00 0.83 O 0.60 0.86 0.83 1.00 And I have a search vector of variables
> v <- c("X", "O")
I want to write a function bestMatch(searchvector, matchMat) such that for each variable in searchvector, I get the variable that it has the highest match to - but searching only among variables to the left of it in the 'matching' matrix, and not matching with any variable in searchvector itself. So in the above example, although "X" has the highest match (0.86) with "O", I can't choose "O" as it's to the right of X (and also because "O" is in the searchvector v already); I'll have to choose "A". For "O", I will choose "L", the variable it's best matched with - as it can't match "X" already in the search vector. My function bestMatch(v, a) will then return c("A", "L") My matrix a is quite large, and I have a long list of search vectors v, so I need an efficient method. I wrote this: bestMatch <- function(searchvector, matchMat) { sapply(searchvector, function(cc) { y <- matchMat[!(rownames(matchMat) %in% searchvector) & (index(rownames(matchMat)) < match(cc, rownames(matchMat))), cc, drop = FALSE]; rownames(y)[which.max(y)] }) } Any advice? Thanks, Murali ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Received on Thu 31 Mar 2011 - 15:32:43 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 31 Mar 2011 - 15:50:25 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive