From: ronggui <0034058_at_fudan.edu.cn>

Date: Tue 02 Aug 2005 - 01:02:56 EST

Deparment of Sociology

Fudan University

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Aug 02 01:19:05 2005

Date: Tue 02 Aug 2005 - 01:02:56 EST

try

*> attach(dat)
**> dat<-dat[order(fam,wt),]
*

#sort the data ,as the stata's byable command does

> lis<-by(dat,fam,function(x) x[length(x$fam),])

#equall your stata command ,but return a list.

> do.call(rbind,lis)

#to make the list to be a matrix-like result.
fam wt keep

1 1 1.0 1 2 2 1.0 1 3 3 0.4 0 4 4 0.4 0

- 2005-08-01 22:24:27 您在来信中写道：=======

>I am struggling with migrating some stata code to R. I have a data

*>frame containing, sometimes, repeat observations (rows) of the same
**>family. I want to keep only one observation per family, selecting
**>that observation according to some other variable. An example data
**>frame is:
**>
**># construct example data
**>fam <- c(1,2,3,3,4,4,4)
**>wt <- c(1,1,0.6,0.4,0.4,0.4,0.2)
**>keep <- c(1,1,1,0,1,0,0)
**>dat <- as.data.frame(cbind(fam,wt,keep))
**>dat
**>
**>I want to keep the observation for which wt is a maximum, and where
**>this doesn't identify a unique observation, to keep just one anyway,
**>not caring which. Those observations are indicated above by keep==1.
**>(Note, keep <- c(1,1,1,0,0,1,0) would be fine too, but not
**>c(1,1,1,0,0,0,1)).
**>
**>The stata code I would use is
**>bys fam (wt): keep if _n==_N
**>
**>This is my (long-winded) attempt in R:
**>
**># first keep those rows where wt=max_fam(wt)
**>maxwt <- by(dat,dat$fam,function(x) max(x[,2]))
**>maxwt <- sapply(maxwt,"[[",1)
**>maxwt.dat <- data.frame("maxwt"=maxwt,"fam"=as.integer(names(maxwt)))
**>dat <- merge(dat,maxwt.dat)
**>dat <- dat[dat$wt==dat$maxwt,]
**>dat
**>
**>Now I am stuck - I want to keep either row with fam==4, and have tried
**>playing around with combinations of sample and apply or by, but with
**>no success. I can only find an inefficient for-loop solution:
**>
**># identify those rows with >1 observation
**>more <- by(dat,dat$fam,function(x) dim(x)[1])
**>more <- sapply(more,"[[",1)
**>more.dat <- data.frame("more"=more,"fam"=as.integer(names(more)))
**>dat <- merge(dat,more.dat)
**>
**># sample from those for whom more>1
**>result<-dat[dat$more==1,]
**>for(f in unique(dat$fam[dat$more>1])) {
**> rows <- rownames(dat[dat$fam==f,])
**> result <- rbind(result,dat[sample(rows,1),])
**>}
**>result
**>
**>I am sure that for something so simple in stata to be so complicated
**>in R must indicate ignorance of R on my part, but searches of help
**>files and RSiteSearch hasn't led to any better solution.
**>
**>Any suggestions would be most helpful! Thanks, C.
**>
**>______________________________________________
**>R-help@stat.math.ethz.ch mailing list
**>https://stat.ethz.ch/mailman/listinfo/r-help
**>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
*

- = = = = = = = = = = = = = = = = = = =

2005-08-01

Deparment of Sociology

Fudan University

Blog:http://sociology.yculblog.com

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Aug 02 01:19:05 2005

*
This archive was generated by hypermail 2.1.8
: Sun 23 Oct 2005 - 15:00:35 EST
*