Re: [R] merging/intersecting 2 data frames

From: jim holtman <jholtman_at_gmail.com>
Date: Tue, 29 Jun 2010 15:31:25 -0400

use 'merge'

> a.df

        DATE GENDER PATIENT_ID AGE             SYNDROME
1  4/16/2009      F      23686  45         RASH ON BODY
2  4/16/2009      F      13840  35         CANT URINATE
3  4/16/2009      M      12895  30       BLURRED VISION
4  4/16/2009      M      18375  33       UNABLE TO VOID
5  4/16/2009      M       2237  44         SOB WEAKNESS
6  4/16/2009      F      21484  41 TOOTH PAINTOOTH PAIN
7  4/16/2009      M      10783  37          RT ARM PAIN
8  4/16/2009      M      12610  65        L FOOT INJURY
9  4/16/2009      F       3495  29 URINARY DIFFICULTIES
10 4/16/2009      F        351  36           PT STS MVA

> b.df

   DATE_OF_DEATH ID

1      4/19/2009 23686
2      4/19/2009 13840
3      4/19/2009 12895
4      4/19/2009 18375
5      4/19/2009   351
6      4/20/2009  3495
7      4/20/2009  4084
8      4/20/2009 19616
9      4/20/2009 17965
10     4/20/2009 11863

> merge(a.df, b.df, by.x="PATIENT_ID", by.y="ID")
PATIENT_ID DATE GENDER AGE SYNDROME DATE_OF_DEATH 1 351 4/16/2009 F 36 PT STS MVA 4/19/2009 2 3495 4/16/2009 F 29 URINARY DIFFICULTIES 4/20/2009 3 12895 4/16/2009 M 30 BLURRED VISION 4/19/2009 4 13840 4/16/2009 F 35 CANT URINATE 4/19/2009 5 18375 4/16/2009 M 33 UNABLE TO VOID 4/19/2009 6 23686 4/16/2009 F 45 RASH ON BODY 4/19/2009

>

On Tue, Jun 29, 2010 at 3:21 PM, Erin Hodgess <erinm.hodgess_at_gmail.com> wrote:
> Dear R People:
>
> I have two data frames, a.df and b.df as seen here:
>
>> a.df[1:10,]
>        DATE GENDER PATIENT_ID AGE             SYNDROME
> 1  4/16/2009      F      23686  45         RASH ON BODY
> 2  4/16/2009      F      13840  35         CANT URINATE
> 3  4/16/2009      M      12895  30       BLURRED VISION
> 4  4/16/2009      M      18375  33       UNABLE TO VOID
> 5  4/16/2009      M       2237  44         SOB WEAKNESS
> 6  4/16/2009      F      21484  41 TOOTH PAINTOOTH PAIN
> 7  4/16/2009      M      10783  37          RT ARM PAIN
> 8  4/16/2009      M      12610  65        L FOOT INJURY
> 9  4/16/2009      F       3495  29 URINARY DIFFICULTIES
> 10 4/16/2009      F        351  36           PT STS MVA
>> b.df[1:10,]
>   DATE_OF_DEATH    ID
> 1      4/19/2009 21676
> 2      4/19/2009 13717
> 3      4/19/2009 20498
> 4      4/19/2009 14281
> 5      4/19/2009 38848
> 6      4/20/2009   331
> 7      4/20/2009  4084
> 8      4/20/2009 19616
> 9      4/20/2009 17965
> 10     4/20/2009 11863
>>
>
> a.df will always be larger than b.df.
>
> I want to create a third data frame that is matched on PATIENT_ID from
> a.df and ID from b.df.
>
> If there is no match from a.df$PATIENT_ID to b.df$ID, then we omit the
> row from the new data.frame.
>
> If there is a match, we include the DATE_OF_DEATH column from b.df.
>
> I've tried all kinds of tricks, but nothing works exactly as I wish.
>
> Thanks in advance,
> Sincerely,
> Erin
>
>
> --
> Erin Hodgess
> Associate Professor
> Department of Computer and Mathematical Sciences
> University of Houston - Downtown
> mailto: erinm.hodgess_at_gmail.com
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue 29 Jun 2010 - 20:34:08 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 30 Jun 2010 - 21:00:43 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive