Re: [R] How to delete duplicate cases?

From: Marc Schwartz <marc_schwartz_at_comcast.net>
Date: Thu, 24 Jul 2008 09:34:03 -0500

on 07/24/2008 09:00 AM Daniel Wagner wrote:
> Dear R users,
>
> I have a dataframe with lot of duplicate cases and I want to delete duplicate ones which have low rank and keep that case which has highest rank.
> e.g
>

>> df1

> cno rank
> 1 1342 0.23
> 2 1342 0.14
> 3 1342 0.56
> 4 2568 0.15
> 5 2568 0.89
>
> so I want to keep 3rd and 5th cases with highest rank (0.56 & 0.89) and delete rest of the duplicate cases.
> Could somebody help me?
>
> Regards
>
> Daniel
> Amsterdam

For the simple two column case, see ?aggregate:

 > aggregate(dfl$rank, list(cno = dfl$cno), max)

    cno x
1 1342 0.56
2 2568 0.89

A more generic approach might be:

 > do.call(rbind, lapply(split(dfl, dfl$cno),

                         function(x) x[which.max(x$rank), ]))
       cno rank

1342 1342 0.56
2568 2568 0.89

For example, using the iris dataset, get the rows, by Species, with the highest Sepal.Length:

 > do.call(rbind, lapply(split(iris, iris$Species),

                         function(x) x[which.max(x$Sepal.Length), ]))
            Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
setosa              5.8         4.0          1.2         0.2     setosa
versicolor          7.0         3.2          4.7         1.4 versicolor
virginica           7.9         3.8          6.4         2.0  virginica


HTH, Marc Schwartz



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 24 Jul 2008 - 15:09:54 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 24 Jul 2008 - 15:32:20 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive