Re: [R] Matched pairs with two data frames

From: Daniel Malter <>
Date: Wed, 16 Apr 2008 06:19:12 -0400

Hi, sorry for jumping in here, but to me your description of why you want to have only the needed data rows remains ambiguous.

If you just want to select the data you indicate then you do:[ , needed=="yes"]

where "data" is the name of your long dataset (13 obs). As I see it, you are just selecting data rows from the whole data set; you are not merging or unstacking data in any way.

If that is true, it would be more helpful to know why it is lines 1, 6, and 9 that you need and not the others. That is, is there a systematic reason for "yes" and "no" in the "needed" variable - a reason that could be coded? Or is it just your (arbitrary) selection? This is the part of your questions that I completely do not understand. If the answer to the aforementioned question is yes, there is a reason, then we would need to know the criteria on which to base the coding for this variable. What makes a yes a yes and a no a no?


cuncta stricte discussurus

-----Ursprüngliche Nachricht-----
Von: [] Im Auftrag von Udo
Gesendet: Wednesday, April 16, 2008 5:59 AM An:
Betreff: Re: [R] Matched pairs with two data frames

my intention was, to perform a one-to-one exact match, which pairs each treated unit with ONE control unit (without replacement), using my two confounders (age, school) for matching.

Patrick Connolly schrieb:
On Mon, 14-Apr-2008 at 08:37AM +0200, Udo wrote:

|> Zitat von Peter Alspach <>:
|> > Udo
|> >
|> > Seems you might want merge()
|> >
|> > HTH .......
|> >
|> > Peter Alspach
|> Thank you Peter and Jorge,
|> but as I had written in my last sentence, "Merge doesn´t do the job, 
|> because it makes all possible matches", but maybe there is a 
|> sophisticated solution with "merge", I could not bring light to.

>Maybe it would help if we knew what you mean by 'all' in this context.
>To get the NAs in your example, it is NECESSARY to use the all = TRUE
>argument. Without the all = TRUE, the NA rows are omitted.

With 'all' I mean, that in the merged data frame (13 Obs) there are 8 cases (2*4) with age=1 and school=10 (all possible combinations).

>What is it that you don't want in this:
I only "need" line 1, 6 and 9. To show this, I added "needed" by hand.

   age school out1 out2	     needed
1    1     10  9.5  1.1      yes
2    1     10  9.5  2.0	     no
3    1     10  9.5  3.5	     no
4    1     10  9.5  4.9	     no
5    1     10  2.3  1.1	     no
6    1     10  2.3  2.0	     yes
7    1     10  2.3  3.5	     no
8    1     10  2.3  4.9	     no
9    2     20  3.3  6.5	     yes
10   2     20  4.1  6.5	     no
11   2     20  5.9  6.5	     no
12   3     33   NA  5.2	     no
13   4     11  4.6   NA	     no

>Whatever it is, can't you subset them out?
Yes, that´s the problem. To describe what I mean, I added the variable “needed”
by hand. I don´t know how to compute such a variable to subset.

My final data frame should look like this:

    age school out1 out2	nedded
1    1     10  9.5  1.1 	yes
6    1     10  2.3  2.0	        yes
9    2     20  3.3  6.5	        yes

I hope, I could make clear, what the problem ist and waht I mean.

An alternative would be using packages like “Matching” or “MatchIt”, which need a “long” data structure with one data frame and not a “wide” one with two data frames.

Many thanks!
Udo mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Wed 16 Apr 2008 - 10:22:19 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 17 Apr 2008 - 20:31:39 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive