Re: [R] converting stata's by syntax to R

From: Dimitris Rizopoulos <dimitris.rizopoulos_at_med.kuleuven.be>
Date: Tue 02 Aug 2005 - 01:28:35 EST

if you also need to create the `keep' vector, then you could try this approach:

fam <- c(1,2,3,3,4,4,4)
wt <- c(1,1,0.6,0.4,0.4,0.4,0.2)
dat <- data.frame(fam, wt)
###########
keep <- unlist( lapply(split(wt, fam), function(x){

        ind <- rep(FALSE, length(x))
        ind[which.max(x)] <- TRUE
        ind

    }) )
as.numeric(keep)
dat[keep, ]

I hope it helps.

Best,
Dimitris



Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium

Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.be/biostat/
     http://www.student.kuleuven.be/~m0390867/dimitris.htm


>I am struggling with migrating some stata code to R. I have a data
> frame containing, sometimes, repeat observations (rows) of the same
> family. I want to keep only one observation per family, selecting
> that observation according to some other variable. An example data
> frame is:
>
> # construct example data
> fam <- c(1,2,3,3,4,4,4)
> wt <- c(1,1,0.6,0.4,0.4,0.4,0.2)
> keep <- c(1,1,1,0,1,0,0)
> dat <- as.data.frame(cbind(fam,wt,keep))
> dat
>
> I want to keep the observation for which wt is a maximum, and where
> this doesn't identify a unique observation, to keep just one anyway,
> not caring which. Those observations are indicated above by
> keep==1.
> (Note, keep <- c(1,1,1,0,0,1,0) would be fine too, but not
> c(1,1,1,0,0,0,1)).
>
> The stata code I would use is
> bys fam (wt): keep if _n==_N
>
> This is my (long-winded) attempt in R:
>
> # first keep those rows where wt=max_fam(wt)
> maxwt <- by(dat,dat$fam,function(x) max(x[,2]))
> maxwt <- sapply(maxwt,"[[",1)
> maxwt.dat <-
> data.frame("maxwt"=maxwt,"fam"=as.integer(names(maxwt)))
> dat <- merge(dat,maxwt.dat)
> dat <- dat[dat$wt==dat$maxwt,]
> dat
>
> Now I am stuck - I want to keep either row with fam==4, and have
> tried
> playing around with combinations of sample and apply or by, but with
> no success. I can only find an inefficient for-loop solution:
>
> # identify those rows with >1 observation
> more <- by(dat,dat$fam,function(x) dim(x)[1])
> more <- sapply(more,"[[",1)
> more.dat <- data.frame("more"=more,"fam"=as.integer(names(more)))
> dat <- merge(dat,more.dat)
>
> # sample from those for whom more>1
> result<-dat[dat$more==1,]
> for(f in unique(dat$fam[dat$more>1])) {
> rows <- rownames(dat[dat$fam==f,])
> result <- rbind(result,dat[sample(rows,1),])
> }
> result
>
> I am sure that for something so simple in stata to be so complicated
> in R must indicate ignorance of R on my part, but searches of help
> files and RSiteSearch hasn't led to any better solution.
>
> Any suggestions would be most helpful! Thanks, C.
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Aug 02 01:46:39 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 15:00:37 EST