Re: [R] filtering out duplicates & creating a dataframe with unique id

From: Dimitris Rizopoulos <dimitris.rizopoulos_at_med.kuleuven.be>
Date: Tue, 01 Apr 2008 11:46:36 +0200

try the following:

dat <- data.frame(

    id = gl(10, 5),
    y = rnorm(50),
    time = rep(1:5, 10),
    sex = gl(2, 25, labels = c("male", "female")),     age = round(rep(runif(10, 18, 55), each = 5), 1) )

dat[tapply(row.names(dat), dat$id, head, n = 1), ] dat[!duplicated(dat$id), ]

I hope it helps.

Best,
Dimitris



Dimitris Rizopoulos
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium

Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
     http://www.student.kuleuven.be/~m0390867/dimitris.htm


> Hello,
>
> I am working on a dataframe that contains a number of duplicates
> (e.g
> a person may have more than one court appearance). There are 539
> rows. If I run the code:
>
> > length(unique(Feb25$ Patient.Id))
>
> this indicates there are 508 unique individuals. I have been unable
> to work out how to filter out rows where there is a duplicate id so
> that the resulting dataframe consists only of the one id per person,
> and this id, is the first one thartappears.
>
> I was also interested in creating a data frame that consisted of
> these removed duplicates.
>
> Any assistance with the code to do this is much appreciated,
>
>
> regards
>
> Bob Green
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 01 Apr 2008 - 09:52:38 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 01 Apr 2008 - 10:30:25 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive