Re: [R] Compare two data sets

From: David Winsemius <dwinsemius_at_comcast.net>
Date: Wed, 26 Mar 2008 01:50:59 +0000 (UTC)

<amarkey_at_uiuc.edu> wrote in
news:20080325101909.BDK93111_at_expms2.cites.uiuc.edu:

> I would like to compare two data sets saved as text files (example
> below) to determine if both sets are identical(or if dat2 is missing
> information that is included in dat1) and if they are not identical
> list what information is different between the two sets(ie output
> "a1", "a3" as the differing information). The overall purpose would
> be to remove "a1" and "a3" from dat 1 so both dat1 and dat2 are the
> same. My R abilities are somewhat limited so any suggestions are
> greatly appreciated.

I do not understand what it would mean to remove elements so "they would look the same". Why wouldn't you just use the smaller set?
>
> Alysta
>
> dat1
> a1
> a2
> a3
> a4
> a5
> a6
>
> dat2
> a2
> a4
> a5
> a6

You might want to look at the %in% function. These examples created with neither dat1 nor dat2 being proper subsets of the other.

dat1 <- paste('a', 1:6, sep='')
dat2 <- paste('a', c(2,4:6,8,9,10), sep='')
> dat1

[1] "a1" "a2" "a3" "a4" "a5" "a6"
> dat2

[1] "a2" "a4" "a5" "a6" "a8" "a9" "a10"

dat2 %in% dat1
#[1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE dat1 %in% dat2
#[1] FALSE TRUE FALSE TRUE TRUE TRUE ### And then use the logical vectors as index arguments ### to first get the common elements
> dat1[dat1 %in% dat2]

[1] "a2" "a4" "a5" "a6"

> dat2[dat2 %in% dat1]

[1] "a2" "a4" "a5" "a6"

### And then to find the non-shared elements
> dat2[!(dat2 %in% dat1)]

[1] "a8" "a9" "a10"
> dat1[!(dat1 %in% dat2)]

[1] "a1" "a3"

-- 
David Winsemius

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 26 Mar 2008 - 04:58:55 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 26 Mar 2008 - 06:30:24 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive