From: Charles C. Berry <cberry_at_tajo.ucsd.edu>

Date: Thu 11 Jan 2007 - 22:26:05 GMT

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri Jan 12 09:37:07 2007

Date: Thu 11 Jan 2007 - 22:26:05 GMT

On Thu, 11 Jan 2007, Farrel Buchinsky wrote:

> Am I correct in believing that one cannot match on multiple columns?

*> One can indeed subset on multiple criteria from different variables
**> (or columns) but not from unique combinations thereof.
**> I need to exclude about 10000 rows from 108000 rows of data based on
**> several unique combinations of identifiers in two columns. Only
**> merge() seems to be able to do that. Merge would allow me to
**> positively select but it would not allow me to deselect (or exclude).
**> Look at how I got around the problem.
**> It is inelegant. Have a missed a more direct function?
**>
*

I guess that depends on what you regard as 'more direct'.

newdata[ !is.element( interaction(newdata), interaction(exclude) ) , ]

*>
*

> x<-seq(from=1,to=10)

*> y <-rep(1:2,5)
**> data<-data.frame(cbind(x,y))
**> x<-seq(from=1,to=10)
**> y <-rep(1:5,2)
**> data1<-data.frame(cbind(x,y))
**> newdata<-rbind(data,data1)
**> newdata[10,2] <- 3
**> exclude<-newdata[18:20,]
**> #This is a simulation of real life problem
**> #We now have two dataframes. Newdata is the data set from an
**> experiment. In real #life it has an additional 9 columns of data
**> #exclude is a dataframe that was manually created after discovering
**> some quality #control issues
**> #Any row in newdata that matches any row in exclude must be discarded
**> match(newdata$x,exclude$x)# useless because it is only on one column
**> match(newdata$y,exclude$y)# useless because it is only on one column
**> newdata$x %in% exclude$x # useless because it is only on one column
**> newdata$y %in% exclude$y # useless because it is only on one column
**> newdata$x %in% exclude$x & newdata$y %in% exclude$y # useless,
**> #eventhough it is using both columns, because it is not
**> #using them in a synchronous manner. Row 10 in new data should not
**> have been #marked "TRUE"
**> #It was only labeled such because the 10 in the x column is indeed in
**> the exclude x #column and the 3 in the y column is indeed in the
**> exclude y column but not #together
**> which(newdata$x %in% exclude$x & newdata$y %in% exclude$y)#also
**> gets it #wrong
**> match(newdata,exclude)# intuitively this could have worked but alas
**> match can #only handle vectors and not dataframes. It cannot match on
**> multiple columns
**> #I have to stoop to the inelegant maneuver of creating a combined
**> variable of the #two columns, albeit only temporarily
**> paste(newdata$x,newdata$y,sep=":") %in% paste(exclude$x,exclude$y,sep=":")
**> #or one could do this
**> match(paste(newdata$x,newdata$y,sep=":"),paste(exclude$x,exclude$y,sep=":"))
**> #or
**> {which(paste(newdata$x,newdata$y,sep=":") %in% paste
**> (exclude$x,exclude$y,sep=":"))}
**> #Or if you want to use the result in an index term or a selection
**> argument in a subset command
**> {paste(newdata$x,newdata$y,sep=":") %in%
**> paste(exclude$x,exclude$y,sep=":")==FALSE}
**> #As in
**> {newdata[paste(newdata$x,newdata$y,sep=":") %in%
**> paste(exclude$x,exclude$y,sep=":")==FALSE,]}
**> #or
**> {subset(newdata,paste(newdata$x,newdata$y,sep=":") %in%
**> paste(exclude$x,exclude$y,sep=":")==FALSE)}
**>
**>
**> --
**> Farrel Buchinsky
**>
**> ______________________________________________
**> R-help@stat.math.ethz.ch mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**> and provide commented, minimal, self-contained, reproducible code.
**>
*

Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry@tajo.ucsd.edu UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901 ______________________________________________R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri Jan 12 09:37:07 2007

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Fri 12 Jan 2007 - 00:30:26 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*