From: Dimitris Rizopoulos <dimitris.rizopoulos_at_med.kuleuven.be>

Date: Tue 02 Aug 2005 - 01:28:35 EST

}) )

as.numeric(keep)

dat[keep, ]

Dimitris Rizopoulos

Ph.D. Student

Biostatistical Centre

School of Public Health

Catholic University of Leuven

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Aug 02 01:46:39 2005

Date: Tue 02 Aug 2005 - 01:28:35 EST

if you also need to create the `keep' vector, then you could try this approach:

fam <- c(1,2,3,3,4,4,4)

wt <- c(1,1,0.6,0.4,0.4,0.4,0.2)

dat <- data.frame(fam, wt)

###########

keep <- unlist( lapply(split(wt, fam), function(x){

ind <- rep(FALSE, length(x)) ind[which.max(x)] <- TRUE ind

}) )

as.numeric(keep)

dat[keep, ]

I hope it helps.

Best,

Dimitris

Dimitris Rizopoulos

Ph.D. Student

Biostatistical Centre

School of Public Health

Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium

Tel: +32/16/336899 Fax: +32/16/337015 Web: http://www.med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm

- Original Message ----- From: "Chris Wallace" <c.wallace@qmul.ac.uk> To: <r-help@stat.math.ethz.ch> Sent: Monday, August 01, 2005 4:24 PM Subject: [R] converting stata's by syntax to R

>I am struggling with migrating some stata code to R. I have a data

*> frame containing, sometimes, repeat observations (rows) of the same
**> family. I want to keep only one observation per family, selecting
**> that observation according to some other variable. An example data
**> frame is:
**>
**> # construct example data
**> fam <- c(1,2,3,3,4,4,4)
**> wt <- c(1,1,0.6,0.4,0.4,0.4,0.2)
**> keep <- c(1,1,1,0,1,0,0)
**> dat <- as.data.frame(cbind(fam,wt,keep))
**> dat
**>
**> I want to keep the observation for which wt is a maximum, and where
**> this doesn't identify a unique observation, to keep just one anyway,
**> not caring which. Those observations are indicated above by
**> keep==1.
**> (Note, keep <- c(1,1,1,0,0,1,0) would be fine too, but not
**> c(1,1,1,0,0,0,1)).
**>
**> The stata code I would use is
**> bys fam (wt): keep if _n==_N
**>
**> This is my (long-winded) attempt in R:
**>
**> # first keep those rows where wt=max_fam(wt)
**> maxwt <- by(dat,dat$fam,function(x) max(x[,2]))
**> maxwt <- sapply(maxwt,"[[",1)
**> maxwt.dat <-
**> data.frame("maxwt"=maxwt,"fam"=as.integer(names(maxwt)))
**> dat <- merge(dat,maxwt.dat)
**> dat <- dat[dat$wt==dat$maxwt,]
**> dat
**>
**> Now I am stuck - I want to keep either row with fam==4, and have
**> tried
**> playing around with combinations of sample and apply or by, but with
**> no success. I can only find an inefficient for-loop solution:
**>
**> # identify those rows with >1 observation
**> more <- by(dat,dat$fam,function(x) dim(x)[1])
**> more <- sapply(more,"[[",1)
**> more.dat <- data.frame("more"=more,"fam"=as.integer(names(more)))
**> dat <- merge(dat,more.dat)
**>
**> # sample from those for whom more>1
**> result<-dat[dat$more==1,]
**> for(f in unique(dat$fam[dat$more>1])) {
**> rows <- rownames(dat[dat$fam==f,])
**> result <- rbind(result,dat[sample(rows,1),])
**> }
**> result
**>
**> I am sure that for something so simple in stata to be so complicated
**> in R must indicate ignorance of R on my part, but searches of help
**> files and RSiteSearch hasn't led to any better solution.
**>
**> Any suggestions would be most helpful! Thanks, C.
**>
**> ______________________________________________
**> R-help@stat.math.ethz.ch mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide!
**> http://www.R-project.org/posting-guide.html
**>
*

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Aug 02 01:46:39 2005

*
This archive was generated by hypermail 2.1.8
: Sun 23 Oct 2005 - 15:00:37 EST
*