Re: [R] converting stata's by syntax to R

From: Thomas Lumley <tlumley_at_u.washington.edu>
Date: Tue 02 Aug 2005 - 02:43:03 EST

On Mon, 1 Aug 2005, Chris Wallace wrote:

> I am struggling with migrating some stata code to R. I have a data
> frame containing, sometimes, repeat observations (rows) of the same
> family. I want to keep only one observation per family, selecting
> that observation according to some other variable. An example data
> frame is:
>
> # construct example data
> fam <- c(1,2,3,3,4,4,4)
> wt <- c(1,1,0.6,0.4,0.4,0.4,0.2)
> keep <- c(1,1,1,0,1,0,0)
> dat <- as.data.frame(cbind(fam,wt,keep))
> dat
>
> I want to keep the observation for which wt is a maximum, and where
> this doesn't identify a unique observation, to keep just one anyway,
> not caring which. Those observations are indicated above by keep==1.
> (Note, keep <- c(1,1,1,0,0,1,0) would be fine too, but not
> c(1,1,1,0,0,0,1)).
>
> The stata code I would use is
> bys fam (wt): keep if _n==_N

A reasonably direct translation of the Stata code is

   index <- order(fam, -wt)
   keep <- !duplicated(fam[index])
   dat <- data.frame(fam=fam[index], wt=wt[index], keep=keep)

which sorts wt into decreasing order within family, then keeps the first observation in each family.

This is less general than solutions other people have given, but I'd expect it to be faster for large data sets. 'keep' ends up TRUE/FALSE rather than 1/0; if this is a problem use as.numeric() on it.

         -thomas



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Aug 02 02:48:06 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 15:00:38 EST