[R] One-to-one matching?

From: <Alec.Zwart_at_csiro.au>
Date: Mon, 23 Jun 2008 12:57:26 +1000


Hi folks,

Can anyone suggest an efficient way to do "matching without replacement", or "one-to-one matching"? pmatch() doesn't quite provide what I need...

For example,

lookupTable <- c("a","b","c","d","e","f") matchSample <- c("a","a","b","d")
##Normal match() behaviour:
match(matchSample,lookupTable)
[1] 1 1 2 4

My problem here is that both "a"s in matchSample are matched to the same "a" in the lookup table. I need the elements of the lookup table to be excluded from the table as they are matched, so that no match can be found for the second "a".

Function pmatch() comes close to what I need:

pmatch(matchSample,lookupTable)
[1] 1 NA 2 4

Yep! However, pmatch() incorporates partial matching, which I definitely don't want:

lookupTable <- c("a","b","c","d","e","aaaaaaaaf") matchSample <- c("a","a","b","d")
pmatch(matchSample,lookupTable)
[1] 1 6 2 4

## i.e. the second "a", matches "aaaaaaaaf" - I don't want this.

Of course, when identical items ARE duplicated in both sample and lookup table, I need the matching to reflect this:

lookupTable <- c("a","a","c","d","e","f") matchSample <- c("a","a","c","d")
##Normal match() behaviour
match(matchSample,lookupTable)
[1] 1 1 3 4

No good - pmatch() is better:

lookupTable <- c("a","a","c","d","e","f") matchSample <- c("a","a","c","d")
pmatch(matchSample,lookupTable)
[1] 1 2 3 4

...but we still have the partial matching issue...

##And of course, as per the usual behaviour of match(), sample elements missing from the lookup table should return NA:

matchSample <- c("a","frog","e","d") ; print(matchSample) match(matchSample,lookupTable)

Is there a nifty way to get what I'm after without resorting to a for loop? (my code's already got too blasted many of those...)

Thanks,

Alec Zwart
CMIS CSIRO
alec.zwart_at_csiro.au



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 23 Jun 2008 - 03:01:00 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 24 Jun 2008 - 08:30:50 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive