Re: [R] subset rows in two dataframes

From: jim holtman <jholtman_at_gmail.com>
Date: Sun, 11 May 2008 12:39:27 -0400

Here is one way to compare the entire rows between the data frames:

> x1 <- do.call(paste, dat1)
> x2 <- do.call(paste, dat2)
> dat1[x1 %in% x2,]

[1] v1 v2
<0 rows> (or 0-length row.names)
>
> x1

 [1] "2006-01-03 7312.5" "2006-05-03 3352.5" "2006-05-04 4252.5" "2006-05-11 3825"
 [5] "2006-05-12 2700" "2006-05-16 585" "2006-05-19 810" "2006-05-26 3015"

 [9] "2006-09-15 2925"   "2006-10-30 1102.5" "2006-11-08 2632.5" "2006-11-14
652.5"
[13] "2006-11-20 1417.5"


On Sun, May 11, 2008 at 10:27 AM, <partofy_at_inoutbox.com> wrote:

> Not exactly. I need something to subset ONLY rows common to both
> dataframes. In the provided example, dat1 and dat2 have no common rows
> so I would expect:
> [1] v1 v2
> <0 rows> (or 0-length row.names)
>
> But I canīt do it...
>
>
>
>
> On Sun, 11 May 2008 10:07:25 -0400, "Zhuanshi He"
> <zhuanshi.he_at_gmail.com> said:
> > Dear Jim,
> >
> > Maybe u want this,
> >
> > > subset(dat2, time1 %in% dat2$v1 & time2 %in% dat2$v1)

> > v1 v2
> > 2 2006-05-09 7065.0
> > 3 2006-05-04 3622.5
> > 5 2006-07-14 3532.5
> > 7 2006-05-12 6480.0
> > 8 2006-05-17 4612.5
> > 15 2006-07-05 4837.5
> > 16 2006-07-06 3352.5
> > 18 2006-07-24 6772.5
> > 20 2006-07-18 5625.0
> > Warning message:
> > In time1 %in% dat2$v1 & time2 %in% dat2$v1 :
> > longer object length is not a multiple of shorter object length
> >
> >
> >
> > However, it looks the length of time1 and time2 is different.
> >
> >
> --------------------------------------------------------------------------------------------------------------
> >
> > On 5/11/08, partofy_at_inoutbox.com <partofy_at_inoutbox.com> wrote:
> > >
> > > Dear list:
> > >
> > > I can now reproduce with a bit of my real data, the problem I asked
> for
> > > your help yestarday:
> > >
> > > time1<- as.Date(c("2006-01-03", "2006-05-03", "2006-05-04",
> > > "2006-05-11", "2006-05-12", "2006-05-16", "2006-05-19", "2006-05-26",
> > > "2006-09-15", "2006-10-30", "2006-11-08", "2006-11-14",
> "2006-11-20"))
> > > volume1<- c(7312.5, 3352.5, 4252.5, 3825.0, 2700.0, 585.0, 810.0,
> > > 3015.0, 2925.0, 1102.5, 2632.5, 652.5, 1417.5)
> > > dat1<- data.frame(v1=time1, v2=volume1)
> > >
> > > time2<- as.Date(c("2006-05-03", "2006-05-09", "2006-05-04",
> > > "2006-05-08", "2006-07-14", "2006-07-10", "2006-05-12", "2006-05-17",
> > > "2006-05-19", "2006-05-26", "2006-05-29", "2006-05-18", "2006-05-22",
> > > "2006-07-03", "2006-07-05", "2006-07-06", "2006-07-04", "2006-07-24",
> > > "2006-07-12", "2006-07-18"))
> > > volume2<- c(4522.5, 7065.0, 3622.5, 7875.0, 3532.5, 3667.5, 6480.0,
> > > 4612.5, 4005.0, 10350.0, 5310.0, 6345.0, 7177.5, 5107.5, 4837.5,
> 3352.5,
> > > 4050.0, 6772.5, 7290.0, 5625.0)
> > > dat2<- data.frame(v1=time2, v2=volume2)
> > >
> > > subset(dat1, v1 %in% dat2$v1 & v2 %in% dat2$v2)
> > > v1 v2
> > > 2 2006-05-03 3352.5
> > >
> > > This is not what I expect since this row is not present in dat2 and I
> > > just want records present in both dataframes.
> > >
> > > Help?
> > >
> > > J
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Sat, 10 May 2008 18:42:51 -0400, "jim holtman" <jholtman_at_gmail.com
> >
> > > said:
> > >
> > > > This seems to work for me:
> > > >
> > > > > set.seed(1)
> > > > > df1 <- data.frame(v1=factor(sample(1:4,20,TRUE)),
> v2=factor(sample(1:3,20,TRUE)), v3=sample(1:3,20,TRUE))
> > > > > df2 <- data.frame(v1=factor(sample(1:2,20,TRUE)),
> v2=factor(sample(1:2,20,TRUE)), v3=sample(1:2,20,TRUE))
> > > > > subset(df1, (df1$v1 %in% df2$v1) & (df1$v2 %in% df2$v2) & (df1$v3
> %in% df2$v3))
> > > > v1 v2 v3
> > > > 2 2 1 2
> > > > 5 1 1 2
> > > > 11 1 2 2
> > > > 14 2 1 1
> > > > >
> > > >
> > > > Exactly what problems are you having? A sample of your actual data
> > > > would be useful.
> > > >
> > > > On Sat, May 10, 2008 at 6:31 PM, <partofy_at_inoutbox.com> wrote:
> > > > > Dear list:
> > > > >
> > > > > I have two dataframes, say dat1 and dat2. Each has several
> variables but
> > > > > 3 of each are common in both, (say v1, v2 and v3). v1 and v2 are
> > > > > factores while v3 is numeric. Now, I need a subset to extract the
> rows
> > > > > in which v1, v2 and v3 are the same in both dataframes.
> > > > > I tried:
> > > > >
> > > > > subset(dat1, dat1$v1 %in% dat2$v1 & dat1$v2 %in% dat2$v2 &
> dat1$v3 %in%
> > > > > dat2$v3)
> > > > >
> > > > > I dont know why, but this is not working as I was expecting. Any
> > > > > suggestion to improve my code?
> > > > >
> > > > > Thanks in advance
> > > > >
> > > > > Justin
> > > > > --
> > > > >
> > > > > partofy_at_inoutbox.com
> > > > >
> > > > > --
> > > > >
> > > > > ______________________________________________
> > > > > R-help_at_r-project.org mailing list
> > > > >
https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> > > > > and provide commented, minimal, self-contained, reproducible
> code.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Jim Holtman
> > > > Cincinnati, OH
> > > > +1 513 646 9390
> > > >
> > > > What is the problem you are trying to solve?
> > >
> > > --
> > >
> > >
> > > partofy_at_inoutbox.com
> > >
> > > --
> > >
> > > ______________________________________________
> > > R-help_at_r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> > >
> >
> >
> > --
> > Zhuanshi He / Z. He (PhD)
> > Waterloo Centre for Atmospheric Sciences (WCAS)
> > Department of Earth and Environmental Sciences
> > Phy Bldg, Rm 2022
> > University of Waterloo,
> > Waterloo, ON N2L 3G1
> > Canada
> > Tel: +1-519-888-4567 ext 38053 FAX: +1-519-746-0435
> --
>
> partofy_at_inoutbox.com
>
> --
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

	[[alternative HTML version deleted]]


______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Received on Sun 11 May 2008 - 16:46:43 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 11 May 2008 - 17:30:37 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive