From: Farrel Buchinsky <fjbuch_at_gmail.com>

Date: Thu 11 Jan 2007 - 20:56:52 GMT

#eventhough it is using both columns, because it is not

#using them in a synchronous manner. Row 10 in new data should not

have been #marked "TRUE"

#It was only labeled such because the 10 in the x column is indeed in

the exclude x #column and the 3 in the y column is indeed in the exclude y column but not #together

which(newdata$x %in% exclude$x & newdata$y %in% exclude$y)#also gets it #wrong

match(newdata,exclude)# intuitively this could have worked but alas match can #only handle vectors and not dataframes. It cannot match on multiple columns

#I have to stoop to the inelegant maneuver of creating a combined

variable of the #two columns, albeit only temporarily paste(newdata$x,newdata$y,sep=":") %in% paste(exclude$x,exclude$y,sep=":")

#or one could do this

match(paste(newdata$x,newdata$y,sep=":"),paste(exclude$x,exclude$y,sep=":"))

*#or
*

{which(paste(newdata$x,newdata$y,sep=":") %in% paste (exclude$x,exclude$y,sep=":"))}

#Or if you want to use the result in an index term or a selection

argument in a subset command

{paste(newdata$x,newdata$y,sep=":") %in% paste(exclude$x,exclude$y,sep=":")==FALSE}

#As in

{newdata[paste(newdata$x,newdata$y,sep=":") %in% paste(exclude$x,exclude$y,sep=":")==FALSE,]}

*#or
*

{subset(newdata,paste(newdata$x,newdata$y,sep=":") %in% paste(exclude$x,exclude$y,sep=":")==FALSE)}

Date: Thu 11 Jan 2007 - 20:56:52 GMT

Am I correct in believing that one cannot match on multiple columns?
One can indeed subset on multiple criteria from different variables
(or columns) but not from unique combinations thereof.
I need to exclude about 10000 rows from 108000 rows of data based on
several unique combinations of identifiers in two columns. Only
merge() seems to be able to do that. Merge would allow me to
positively select but it would not allow me to deselect (or exclude).
Look at how I got around the problem.

It is inelegant. Have a missed a more direct function?

x<-seq(from=1,to=10)

y <-rep(1:2,5)

data<-data.frame(cbind(x,y))

x<-seq(from=1,to=10)

y <-rep(1:5,2)

data1<-data.frame(cbind(x,y))

newdata<-rbind(data,data1)

newdata[10,2] <- 3

exclude<-newdata[18:20,]

#This is a simulation of real life problem

#We now have two dataframes. Newdata is the data set from an

experiment. In real #life it has an additional 9 columns of data

#exclude is a dataframe that was manually created after discovering

some quality #control issues

#Any row in newdata that matches any row in exclude must be discarded

match(newdata$x,exclude$x)# useless because it is only on one column match(newdata$y,exclude$y)# useless because it is only on one column newdata$x %in% exclude$x # useless because it is only on one column newdata$y %in% exclude$y # useless because it is only on one column newdata$x %in% exclude$x & newdata$y %in% exclude$y # useless,

#eventhough it is using both columns, because it is not

#using them in a synchronous manner. Row 10 in new data should not

have been #marked "TRUE"

#It was only labeled such because the 10 in the x column is indeed in

the exclude x #column and the 3 in the y column is indeed in the exclude y column but not #together

which(newdata$x %in% exclude$x & newdata$y %in% exclude$y)#also gets it #wrong

match(newdata,exclude)# intuitively this could have worked but alas match can #only handle vectors and not dataframes. It cannot match on multiple columns

#I have to stoop to the inelegant maneuver of creating a combined

variable of the #two columns, albeit only temporarily paste(newdata$x,newdata$y,sep=":") %in% paste(exclude$x,exclude$y,sep=":")

#or one could do this

match(paste(newdata$x,newdata$y,sep=":"),paste(exclude$x,exclude$y,sep=":"))

{which(paste(newdata$x,newdata$y,sep=":") %in% paste (exclude$x,exclude$y,sep=":"))}

#Or if you want to use the result in an index term or a selection

argument in a subset command

{paste(newdata$x,newdata$y,sep=":") %in% paste(exclude$x,exclude$y,sep=":")==FALSE}

#As in

{newdata[paste(newdata$x,newdata$y,sep=":") %in% paste(exclude$x,exclude$y,sep=":")==FALSE,]}

{subset(newdata,paste(newdata$x,newdata$y,sep=":") %in% paste(exclude$x,exclude$y,sep=":")==FALSE)}

-- Farrel Buchinsky ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Fri Jan 12 08:05:34 2007

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Thu 11 Jan 2007 - 23:30:27 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*